Unlimited Novelty: Reia and object-aware concurrency

Friday, January 9, 2009

Reia and object-aware concurrency

Concurrency is perhaps the foremost issue today's programmers must learn to cope with. CPU designers have exhausted the level of optimization they can provide to programmers working with a single thread of execution, and more modern designs take transistors which would've ordinarily gone to improve sequential performance and have instead thrown it at more cores. Intel demoed a CPU with 80 cores, yet one third of the transistors seen in a Core 2 Duo. This is the future of processors: more cores that do less. CPU designers are going to devote fewer transistors to each individual core while trying to pack more and more cores onto a single chip. Some crazy people are predicting that the number of CPU cores we see on a single chip is going to grow exponentially. I don't know about that, but it's a reasonable enough guess as any at this point, and it seems like it's where the trends are going.

This is posing a big problem for programmers. Now that all of these cores are available, how do you write programs that can make use of them all? The typical way most programmers are used to dealing with this problem is with threads. Threads let you spread your program's execution across multiple CPU cores, which is great! But threads are error-prone. The synchronization is as hard to get right as manual memory management. Pay attention and use good strategies and it's manageable. Put it in the hands of mediocre programmers and things can quickly fall apart. And once they do it can be incredibly hard to debug.

But it gets worse. Let's assume you've managed to use threads correctly in your program. Great! How well can it scale across an increasing number of CPU cores? The synchronization model employed by most programs uses a locking mechanism to synchronize state. To get a threaded program to scale well, locks must be scrutinized, benchmarked, and made increasingly granular in order for threaded programs to scale well. While this is all well and doable, granular locking is complicated, and the reality is most programs trying to pull it off aren't going to do it right, and for that reason many programmers may avoid it entirely.

So how well will programmers be able to leverage these new multicore CPUs? That question remains up in the air. Threaded approaches are difficult at best, and often hard to scale across an increasing number of CPU cores, especially when complex data structures are involved. For that reason some perceive that programmers are in a sort of crisis in regard to how they will leverage an increasing number of CPU cores.

Some solutions to this problem have emerged. One that gets a lot of attention is software transactional memory. This is a model which looks at memory in a manner similar to a database, using transactions to control concurrent access to shared state. Unfortunately this model has failed to achieve widespread popularity. Theories abound as to what exactly the problem is, generally surrounding its difficulty to apply to real-world problems. So far, this approach to concurrency remains in the collective unconscious as a theoretical one which is rarely applied.

Another approach which has received a considerable amount of hype in certain communities is "Erlang style-concurrency." Erlang is a programming language originally developed for telephony applications which uses lots of small processes that each do one thing and do it well, and talk to each other over message queues. Some liken this to the Unix philosophy: "Write programs that do one thing and do it well. Write programs to work together." However the actual approach of using processes which communicate with messages is known as the Actor model which grew out of Smalltalk.

Erlang processes have lots of nice properties which make concurrency easy. Like OS processes they run simultaneously and the Erlang runtime can distribute them across all available CPU cores with no additional work on the programmer's part. They are pre-emptable: if a process is doing something computationally intensive then after a certain number of "reductions" (e.g. function calls) the Erlang scheduler will pre-empt it and let another process to run. Erlang processes don't share state and can be garbage collected independent of one another. This means there's no "stop the world" condition with the garbage collector. But also, by not sharing state synchronization gets much easier. Now you don't have to synchronize access to shared state, instead you need to synchronize the way your program behaves, which is a far easier problem.

Last year I began working on an Actor model implementation for Ruby called Revactor. It had many of the features seen in Erlang: concurrent "processes" with message boxes and the ability to selectively receive messages from them. However, two things deferred me from continuing to work on Revactor: first, it was slow. Its messaging speed between two processes was nearly two orders of magnitude slower than Erlang, as was the process creation speed. Second, and probably the larger issue, was how much overlap I felt existed between Actors and objects. Actors did many of the same things objects did, but actors and objects did not play well together.

After seeing how much slower Actors in Ruby worked compared to Erlang, I decided to try a different approach entirely. If Actors couldn't come to Ruby, perhaps I could try bringing Ruby to Erlang. In early 2008 I started tinkering with a Ruby-like language on top of Erlang called Reia. Reia brings with it an object model which works on top of the Erlang process and messaging model. Like other object models objects in Reia communicate with messages, but rather than being some metaphorical construct, Reia objects quite literally communicate with messages.

This approach marks a rapid departure from most other object oriented languages. Most object oriented languages have a concurrency primitive like threads which are completely divorced from objects. Threads give you no guarantees about how state is managed or synchronized, and threads have no knowledge of method invocation. In Reia, all objects are concurrent and synchronize around the method dispatch protocol. You can think of each object as being its own thread, except each object's state is independent of all of the others, so there's no need for semaphores or mutexes or critical sections. All synchronization is handled through message dispatch itself, and thanks to the shared-nothing process architecture you only need to worry about synchronizing behavior, not state.

Furthermore, Reia gives you deep hooks into the method dispatch process, allowing objects to asynchronously respond to method calls, to invoke methods asynchronously, or to asynchronously get the response of a method call. This small grab bag of asynchronous tricks allows you to quickly and easily begin adding concurrency into your programs while still preserving the traditional method dispatch approach seen in object oriented languages.

Will this object-aware approach to concurrency and synchronization actually gain traction with programmers? It's hard to say, however it's clear that the other approaches aren't doing so well either. With the number of cores available to programmers skyrocketing, I think it's clear solutions are needed. Only time will tell which ones are actually successful.