Wednesday, May 11, 2011

Introducing Celluloid: a concurrent object framework for Ruby

I've spend a lot of time recently working on a new concurrency library for Ruby called Celluloid. In short, its goal is to make working with threads as easy as working with Ruby objects in most cases, while still remaining loaded with all sorts of power user features for the edge cases. It's heavily inspired by Erlang and the actor model (hence "celluloid") and represents my best effort to port as many of these concepts over while making them more Ruby-like. This is the first in what I hope will be a series of blog posts describing various concurrency problems you might encounter in Ruby and how Celluloid can help solve them.

If you're already sold on using threads in Ruby, feel free to skip the next section of this post. However, as threads have remained a perpetual pariah in Ruby as a language, I feel some explanation is in order as to why you might actually consider using them.

Ruby and Threads

Rubyists generally don't like threads. There's plenty of good reasons to dislike threads: they're error prone for end users and the original implementation of threads in the Matz Ruby Interpreter was pretty crappy and broken in multiple ways. Even with the latest YARV interpreter found in Ruby 1.9, a global lock prevents multiple threads from running concurrently.

On the flip side, if you need multicore concurrency Ruby processes are cheap and there are some pretty good libraries like DRb for allowing Ruby VMs to work together. But even then most people are using Ruby to write stateless webapps that store all state in a database, so you can just run multiple Ruby VMs which all have the same application loaded to leverage multiple CPUs in a machine.

I used to be in the thread-hater camp, having cut my teeth on multithreaded C programs which left a bitter taste in my mouth, but recently I've changed my tune. This is mainly due to the great work of the JRuby and Rubinius teams to add true multicore concurrency to their Ruby implementations. JRuby has supported multicore concurrency via threads for awhile, and Rubinius is adding it in their hydra branch. With these Ruby interpreters, you can run one virtual machine per host and the threads you create will be automatically load balanced among all available CPU cores.

This has immediate benefits for things like Rails applications which enable thread safe mode. Rails will automatically create a new thread per request, allowing one VM to service multiple requests simultaneously. On interpreters like JRuby and Rubinius Hydra, this means you can run just a single VM per host and your application will utilize all available CPU cores. All the memory overhead of loading multiple copies of your application is mitigated, and as an added benefit you can take advantage of the better garbage collection these VMs (particularly the JVM) offer.

There is a catch: libraries can't share state across threads without using some form of thread synchronization. This is often trotted out as a persistent worry of those who prefer to run their Rails apps in the standard single threaded mode. Those gem authors, who knows what they're doing? Maybe they're using global variables!  People don't think about this sort of stuff in Ruby, so shouldn't we just assume that 100% of Ruby libraries aren't thread safe per default?

The truth, at least for things like Rails apps, is that the general way they operate typically eschews thread safety issues. Ruby as a language favors object creation over mutating existing objects, and webapps generally create a new set of objects per request and don't provide mechanisms for sharing state between connections due to their stateless nature. In general, webapps are stateless and don't do things which will share state between threads.

If you do intend to go thread safe on your Rails app, you should certainly do your due diligence for auditing the libraries you use for unsafe usage of global and class variables, but in general I think the worries about running Rails apps in multithreaded mode are overblown. Ruby has much better semantics for promoting thread safety than other languages that have made the leap from single threaded to multithreaded (e.g. C/C++), and those languages have managed to make the transition with countless applications running in a multithreaded mode.

In the two years I've been deploying thread safe Rails applications, I've encountered exactly one thread safety bug, and that was in a library that originally claimed to have a specific thread safe mode but removed it from later releases and I unfortunately didn't catch that they had done so. The fix was simple: just create a thread-specific instance of an object I needed rather than sharing one across all threads. I won't say finding the bug was easy peasy, but all in all I don't think one bug was a bad price to pay for all the benefits of moving to a multithreaded deployment.

Concurrent Objects: How do they work?

Celluloid's concurrent objects work like a combination of normal Ruby objects and threads. You can call methods on them just like normal Ruby objects. To create a concurrent object, declare a class that includes the Celluloid::Actor module:

Then call the spawn method on the class:

This creates a new concurrent Charlie Sheen object. Calling the current_status method on it returns the normal value we'd expect from a method call. If an exception is raised, it will likewise be raised in the scope of the caller. But behind the scenes, all these things are happening in a separate thread.

Let's say things aren't going so well for Charlie. Instead of winning, Charlie is fired:

How can we help Charlie win again?

Calling Sheen#win! here does something special: it executes the method asynchronously. Adding a ! to the end of any method name sends a message to a concurrent object to execute a method, but doesn't wait for a response and thus will always return nil. You can think of this like signaling an object to do something in the background, and perhaps you'll check on the result later using normal synchronous method calls.

Using a ! to call a method asynchronously follows the Ruby convention of predicate methods with a bang on the end being "dangerous." While there are certain dangers of asynchronous methods (namely in how errors are handled), providing thread safe access to instance variables is not one of them. Charlie is running in his own thread, but there's no need to synchronize access to his private variables. This is where Celluloid's secret sauce comes in.

Charlie maintains a queue of pending method calls and executes them one at a time in sequence. Celluloid uses and asynchronous messaging layer that you can communicate with using normal Ruby method call syntax. However, when you call a method on a concurrent object in Celluloid, the "message" you send is quite literal and takes the form of a request object which waits for a response object (instances of Celluloid::Call and Celluloid::Response respectively).

This approach is largely inspired by the gen_server abstraction within the Erlang/OTP framework. For you Erlang nerds who might be worried Celluloid tries to jam everything into gen_server-shaped boxes, let me say right away that isn't the case, but you will have to wait for a further installment of my blog to find out why.

Celluloid by Example: Parallel Map

Let's start with a simple, practical, real-world example. If you're interested in digging deeper into the guts of Celluloid before starting this, I'd suggest you check out the README. That said, let's start with a simple problem: how can we implement a parallel map? That is to say, how can we reimplement Enumerable#map such that all of the map operations are performed in parallel rather than sequentially?

As this is a contrived and relatively simple problem, I'll go ahead and share with you how you might do it using Ruby threads as opposed to using Celluloid:

This version alone winds up being all you need to accomplish simple parallel map operations in Ruby. Here are some examples of using it from irb:

This pmap implementation behaves just like we'd expect map to. It returns the value of the block for each element if everything succeeds correctly, and raises an exception if anything goes wrong along the way.

Now I'd like to show you how to refactor this code to fit into the concurrent object pattern Celluloid uses. Let's start by trying to represent this same code using an object to perform the computation:

To turn this into a concurrent object, we first need to include Celluloid::Actor. To achieve concurrency, we need to make the method that performs the computation callable asynchronously. The initialize method is called synchronously by spawn (in case something goes wrong during initialization), so we'll need to create a separate method that actually calls the given block:

After that we can rewrite Enumerable#pmap using this class:

This creates a new Mapper actor for each element and calls Mapper#run asynchronously on each of them. After every one of them is executing they're iterated again, this time checking the return value. Since actors can only process one method call at a time, the call to Mapper#value will block until Mapper#run has completed, even though Mapper#run was called asynchronously.

This approach of allowing a value to be computed in the background and then only blocking when the value is requested is called a future. You've now seen how to implement a future, but it's also baked directly into Celluloid itself. Here's how to implement Enumerable#pmap using Celluloid::Futures:

Like Mapper, Celluloid::Future takes arguments, passes them to a block, then runs that block in the background asynchronously. Only when the value is requested does it block the current thread.

Now that we have a pmap function, what can we do with it? How about we compare the time it takes to do a bit of Google Image Search screen scraping for different color images for a particular search term using regular map vs. pmap?

The performance metrics vary across Ruby implementations, but in general, the parallel version goes approximately 3.5x faster, and the Celluloid version is 5-10ms slower than the version written using raw Ruby threads.

While this example is fairly trivial, in the next installment of this blog I'd like to demonstrate how to write a Sinatra-based chat server similar to node_chat.

What does this mean for Revactor?

In 2008 I wrote another actor library called Revactor, based on Fibers and evented I/O. While those ideas have grown increasingly popular, Revactor never did. I attribute this largely to Revactor's API, which was a fairly literal translation of Erlang's APIs into Ruby form with too little focus on putting an easy and accessible face on things. If you saw Mike Perham's recent article on actors in Rubinius (Revactor used a mostly identical API, as did MenTaLguY's Omnibus library), the code can be a little daunting, to the point you might need to learn a little Erlang just to figure out how to use it.

Celluloid is the official successor to Revactor. While Celluloid is primarily based on thread-based actors, it's been designed from the ground up with the idea of eventually incorporating event-based actors as well which can interoperate with an event library like EventMachine or cool.io. I know I originally dissed Scala for having both thread-based and event-based actors, but short of an Erlang-like process abstraction, it's not a bad compromise.

What about Reia?

One of the projects I've been working on the longest is Reia, a Ruby-like programming language for the Erlang VM. Today happens to be Reia's third birthday, and I do have to say after three years it's not where I thought it would be. It's generally usable but still missing some big features I feel are needed to make it a "complete" language. The main thing I feel is missing from Reia is a concurrent object model, and you can think of Celluloid as being an in-Ruby prototype of how it would work. I started to work on this in the legacy branch of Reia, but felt like it was too big of a project to tackle until I had fleshed out some of the other fundamentals of the language.

After I feel comfortable with how Celluloid is working I would like to try reimplementing in Reia. After that I think Reia will evolve into a truly useful language which bridges the gap between object oriented languages and Erlang-style concurrency.

I think Celluloid has the potential to be a truly useful library in Ruby on its own, however. It provides most of the underpinnings needed for more Erlang-like concurrent applications without having to switch to a different language.

4 comments:

peak said...

Wonderful!

But there is something wrong -- it seems that "Mapper" objects are not garbage-collected.

E.g. using ruby 1.9.2p180, the following fails:

a = (1 .. 1000).map { |i| i * 8 }
3.times {
b = a.pmap { |n| n / 8 }
p b.length
}

Thanks.
peak@princeton.edu

chris said...

Well Tony, just when I had given up on Ruby as my choice language, you just gave it a new life.

I love the ruby syntax, but I needed it to be BOTH evented and threaded. Up to now, it was not possible, at least easily.

After reading about 2 hours, my guess is that we can probably get 2-3K standard requests/s (I mean not "hello world") out of a normal quadcore box, which is just what I need.

Tony, you're a genius.

My first question is, why didn't I hear from it before ? Such a thing should have spread like fire.

My guess is:
- Missing 3 examples : one for msql2, one for redis, and one for HTTP request (as EM-http_request)

- also, but this is not a big deal, the "future" should be the default, as most want to program synchronously ( res=mysql("select * from tb"); puts res; )

Also, I tested your Reel server with ab, and it seems that the first request always hangs with concurrency (don't pay attention to the actual rps, it is an old machine) :

Concurrency Level: 100
Time taken for tests: 9.701 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 700000 bytes
HTML transferred: 120000 bytes
Requests per second: 1030.77 [#/sec] (mean)
Time per request: 97.015 [ms] (mean)
Time per request: 0.970 [ms] (mean, across all concurrent requests)
Transfer rate: 70.46 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 6 133.9 0 3000
Processing: 0 13 226.3 4 6702
Waiting: 0 13 226.3 4 6701
Total: 1 19 337.2 4 9699

Percentage of the requests served within a certain time (ms)
50% 4
66% 4
75% 4
80% 4
90% 4
95% 12
98% 12
99% 12
100% 9699 (longest request)

Is it my server, or something you've seen before ?

Anyway, I am back to Ruby, thanks to you. Cheers !!!

chris said...

Tony, another thought:

I remember that fibers in jruby are quite expensive, as they open a new thread for each one.

I think there were plans to sort it with coroutines, but I can't find anywhere that it actually was. Is it ? Is rbx treating fibers the same way as jruby ?

youdontwant2know said...
This comment has been removed by the author.