Monday, June 28, 2010

How to Properly Utilize Modern Computers

Holy abstract concept, Batman! Computers are complicated things, especially modern networked ones filled with multiple CPU cores, and anyone professing to know a singular way to utilize them is truly a madman, or a genius, or a little bit of both... but before we can talk about modern computers we must first talk about computers as they used to be.

While this may look prehistoric, it actually happened at Burning Man

Long ago, programming languages were crap and programming was hard. Ken Thompson and Dennis Richie reinvented the way we think about computers by designing not only a new operating system but a new programming language to write that operating system in. It was an ambitious effort that has forever shaped modern computing. Some people don't appreciate it and wax philosophical about hypothetically superior solutions. Those people are retards. Unix rules. Get over it.

No, this isn't quite as good as having sex

Unix had a brilliant underlying philosophy: do one thing and do it well; use multiple processes to solve problems; use text streams as your interface.  The simplicity of the Unix model had a beautiful elegance to it and made it very easy to leverage host resources in a scalable manner.  Instead of writing big clunky monolithic applications, write several small programs that use text streams to talk to each other.  Then if your host just happens to have multiple processors, the kernel can handle the task of farming out multiple jobs to multiple CPUs.

Shells and scripting languages were created to provide the interface and glue to the underlying system utilities. Users could easily queue up a series of tools to analyze and digest text streams as they saw fit.  The interesting thing about this approach is that often times users of these sorts of utilities were performing pure functional programming.  Each utility acts as a function which accepts its input over a pipe and produces output which it sends over a pipe.
A pearl, not to be confused with Perl.  Perl is not a gem.

Perl fit into this ecosystem beautifully.  Perl was focused at making short scripts which work on text streams, while providing easy conversion back and forth between text streams and numbers, since often times the text stream processing you want to do in Unix involves some kind of math.  Perl is an extremely expressive language which allowed people to write far more powerful scripts than anything that had been seen in previous Unix scripting languages.  It was powerful, expressive, and whimsical.  Unfortunately, its whimsy would also be its demise.

Perl's approach didn't scale well to large applications.  The level of abstraction it provided was targeted at writing short scripts within the multiprocess Unix environment.  However, the tide was turning against the entire Unix philosophy.  Monolithic applications and application environments were soon to become the norm.

Java tried to abstract away the underlying operating system.  It was not easy to write Java programs that fit into the traditional Unix philosophy.  Java strongly prefers you talk directly to things in Java Land, and because of that they reinvented standard Unix tools like cron as Quartz. Rather that using the traditional Unix shared-nothing process model to leverage multiple CPUs, Java wants you to use threads.  If you write your entire application this way, you can deploy an application by running a single instance of the Java Virtual Machine and giving it all your system memory.  With a single instance of the JVM you can theoretically utilize all of your available system resources.

Java still got a lot of things wrong.  Threads are one problem (I'll get into that later).  Another is handling application upgrades.  Some environments tried to support hot code swapping, but this usually ended up leaking memory.  In general, the recommended approach for upgrading a Java application is going to be starting and stopping the JVM.  If you happen to be running a network server, such as, say, a web server, this means you have to wait for all clients to disconnect, or you have to shut down without completing their requests.  Depending on the nature of your network protocol, clients may continue to remain connected indefinitely, so upgrades for those types of services typically means mandatory outages.


Unfortunately, both the Unix model and the multithreaded model have warts.

Unix doesn't exactly provide the greatest set of tools for managing multiple processes.  The interprocess signaling model used to manage processes left an awful lot to be desired.  The pipe mechanism used for interprocess communication is rather primitive.  Requiring everything be serialized to text streams incurs a lot of overhead, especially when you write several programs in the same language and can use more efficient data structures than text to communicate data.

In that regard, there are a lot of incentives towards moving to something like Java for concurrent programming.  However, threads have warts too.

The semantics are just plain confusing and the possibility error is huge.  There are a set of best practices which mostly come down to the overriding concern: don't share state between threads.  As long as you never share state between threads there is never any concern over data corruption in concurrent programs.  However, many multithreaded programs share state all over the place, using a collection of highly error-prone synchronization mechanisms to try to keep everything kosher.  However, if you happen to forget to synchronize access to any given piece of shared state, you're screwed, you've just encountered a threading bug.  Sharing state between threads requires extreme vigilance on the part of the programmer, and also intimate knowledge about how threads work and their possible caveats.

Beyond all this, threads are managed by the kernel, and talking to the kernel has high overhead.  A truly amazing feat would be to soup up the Unix model and build your system using lots of shared-nothing processes that communicate using messages and mailboxes rather than primitive text streams.  This is exactly the approach that was taken by Erlang.


Erlang took the whole Unix philosophy to the next level.  Erlang process work like Unix processes, except they use mailboxes and messages instead of pipes.  Unlike threads, Erlang processes run in userspace, which makes them relatively fast.  You can create new Erlang processes a lot faster than you can create threads.  The Erlang VM can run one kernel thread per CPU on your system and load balance processes.  Code can be hot-swapped at runtime in a well-defined manner with extremely consistent semantics.  The entire language philosophy emphasizes the creation of distributed, fault-tolerant, self-healing programs which are able to not only leverage an entire computer, but leverage an entire network of connected computers, using a philosophy which is similar to but an improvement on the Unix approach.

In Erlang, all state is immutable.  This completely eliminates the problems of sharing state between threads.  Due to the way the language is designed it is simply not possible.  This opens up possibilities for Erlang language implementers to safely share state across threads, since the data can't be mutated.  Unfortunately attempts at using this approach in the present Erlang virtual machine have not yet lead to significant performance benefits.

Erlang has its own warts.  For everything it gets right semantically, it is still an aesthetically ugly language.  Very few would describe Erlang code as beautiful.  Despite claims that the semantics, and not the syntax are the barrier to learning Erlang, the main excuse I've heard from people who have avoided Erlang is that they don't like the syntax.
Clojure's logo is so awesome!

Clojure offers a different approach to leveraging modern multiprocessor computers.  It provides shared state that threads can work on transactionally, an approach called Software Transactional Memory (STM), which works kind of like a database.  When you aren't inside a transaction, all state is immutable, which means all state within the language is inherently "thread safe".

Because it's built for the JVM, Clojure is able to take advantage of all the previous effort put into an efficient native threads implementation for the Java programming language.  While this is great for utilizing multicore systems, it's still centered around the notion of shared state.  Distributing your program to multiple computers requires a conceptually different approach than you would ordinarily use to distribute a problem to multiple CPUs.

Beyond that, Clojure uses Lisp syntax.  While some people enjoy writing raw syntax trees because its "homoiconic" nature (not to be confused with Madonna or house music) means they can work all sorts of wonderful wizardry with macros, history has shown that in general most people are not really big fans of that sort of syntax.  Lisp has a lot of parens for a reason: because in most other languages those parens are implicit.

So what's the answer?  How do we "properly" utilize modern computers?  I don't have the answer, only an opinion.

The Unix model was great.  It just lacked a few features to really carry it over to distributed systems.  That said, I really like the idea of running a single VM like the JVM per host, and letting it consume all available system resources running a single application.

Erlang lets you do this, except it provides a Unix-like process model with many of the warts excised.  Erlang has excellent process management, and lets you interact with processes on remote nodes the same way you'd interact with the local system.  Erlang replaced the lousy pipe-based model of interprocess communication with messages, mailboxes, and even filters that allow you to selectively receive from your mailbox.

Erlang provides lightweight, shared-nothing userspace processes and a great way for them to communicate, as well as a scheduler that can dynamically load balance them between native threads and thus host CPUs.  Among many programming experts I've talked to there's a general consensus that having some sort of userspace concurrency context, be it a coroutine or a userspace thread, is a very handy construct to have.  Erlang, perhaps more than any other language out there, has wrapped up userspace concurrency contexts into a very neat little package.

I still feel Erlang's main drawback is its syntax, and I have a few ideas about that.  I think my language Reia brings with it the expressivity of a scripting language like Perl or Ruby combined with the awesome semantics of Erlang which allow it to easily utilize networks of multicore computers.  Reia can support the monolithic one-process-per-application approach so associated with Java while allowing developers to write multiprocess applications internally.  Reia is scripting evolved.


phil pirj said...

Thanks for the post!

Java definitely messed up the whole great JVM idea with its ugly inexpressive C-like syntax, shared-state threads and missing basic features that, for example, C# has for years already.
I haven't tried Scala and not going to do so after one of your previous posts, but it's so pity that there's still no perfect languages on the JVM stage.
I'm not sure if I ever mentionned Kilim, it had the potential to move Erlang aside:

It would be really nice to see something like Scala with message passing at 3x Erlang speed.

I really like Erlang syntax, three days of headache and you are looking at other language syntaxes as they are really missing something (FP, MP, PM).

And after years of Java i'm starting to hate OOP (at least the way it is used in some major projects I have to work with). Erlang is way more flexible.

I hope this year will be productive and will bring more interesting players that will finally capture the flag and will replace mammoths. And I'm pretty sure Reia can be in the lead here.

Barnabus said...

Do any of the modern languages, by default, offload jobs to unused GPU cycles? Mutlicore video cards are a thing now, and unless you're running a game that is using them, it would be nice if the VMs would use all the resources they have available to them.

judess said...

Gotta love people that say computers are complicated, yes its true they are complex but complicated no u can put a computer together from scratch easier then building a model car but we have to look at the past to understand why computers are seemingly becoming less and less compatible with each-other, i ask you to remember Motorized Cars back in the 1960s-1980s if you were even remotely around an engine for one of those Age cars you understand how much easier it was to service those cars yourself and how any part from basically any vehicle could be used for any vehicle parts were almost 100% universal back then there wasn't any "special" piece you needed or the entire system would break down it was general knowledge how to service an engine and that's what the problem was they weren't making any money from servicing your vehicle so they have to make things different and more complex and System Specific so that things aren't Universal and general knowledge and this exact same logic can be applied to the "Advancement of Personal Computers" man what are they up to now windows 11? and im still happy using Windows XP without issues on a 10 year old Frankenstein PC Build using like 20 different systems to strap together i even built my own 2 Fan Wind tunnel to keep it Cold it runs like a beast but you would have to keep this thing at 100% Usage for like 3 months straight to increase the temperature

John Alert said...

Java Training Institutes Java Training Institutes
Java Spring Hibernate Training Institutes in Chennai J2EE Training Institutes in Chennai J2EE Training Institutes in Chennai Core Java Training Institutes in Chennai Core Java Training Institutes in Chennai

Hibernate Online Training Hibernate Online Training Hibernate Training in Chennai Hibernate Training in Chennai Java Online Training Java Online Training Hibernate Training Institutes in ChennaiHibernate Training Institutes in Chennai

John Alert said...

I have read your blog its very attractive and impressive. I like it your blog.

Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

Java Online Training Java Online Training Core Java 8 Training in Chennai java 8 online training JavaEE Training in Chennai Java EE Training in Chennai