Tuesday, April 5, 2011

Reia: Now with a PEG parser

I haven't blogged about Reia in so long. Mea culpa! It's about this time that people start to think "I haven't heard about Reia in awhile. It must be dead." No, Reia is very much alive my friends, and making some pretty interesting progress... I hope to be able to announce some important new features around Reia's 3rd birthday. Stay tuned for that, but first a preview of things to come...


An experimental branch of Reia is now available which uses Neotoma to generate its parser thanks to some pretty impressive contributions by Graeme Defty. Neotoma is a Parsing Expression Grammar (PEG) based parser generator for Erlang, inspired by Packrat. Neotoma's author Sean Cribbs has also released Neotoma 1.5 with contributions by Graeme Defty which uses Erlang binaries internally and is substantially faster than the previous version.

Why does a PEG matter? PEGs make short work of problems that are extremely complex to handle when you attempt to subdivide the problem of parsing into separate scanning and parsing steps, such as how a traditional lex/yacc (or in Reia's case, leex/yecc) parser would operate. If you've ever attempted to look at the source code of the Ruby parser, you'll find some strange and confusing feedback between the lexer and the parser.

What's this for? Well the foremost reason is: interpolated strings. Ruby allows you to embed arbitrary Ruby code inside of any string using the #{...} syntax. This is a really great feature and one I managed to half-assed implement in Reia because I strongly believe in its awesomeness. However, Reia's implementation is a bit brittle and has a lot of implementations. Just recently I switched to the latest greatest version of leex which ships with the Erlang runtime and had to make some rather arcane changes to some code that uses leex internals, just to continue to support a partially functioning string interpolation mechanism.

Now, toss in some more fun complexities: the awesome /.../ regex literal syntax. We all love it, but why doesn't every language have it? What you may not realize is that this syntax is ambiguous in certain cases with the / and /= operators like it is in other languages like JavaScript. Now, for added fun, toss in the fact that regular expressions can interpolate just like strings, and you're beginning to see the makings of a gramatical nightmare.

All of these things are easy with a PEG. PEGs blend tokenization with parsing and allow comprehension of a much wider range of languages than is possible without pulling your hair out using lex/yacc-like tools, and much more than that, it's a very natural process with PEGs. PEGs are also by nature right-recursive, something that works quite well in immutable state languages like Erlang that have to build their syntax trees from right-to-left due to the nature of immutable singly linked lists. These lists only let you append elements on the left.

This sounds well and good, but unfortunately I have some bad news. Even after a few rounds of performance tuning, using an experimental branch of Neotoma that uses binaries internally instead of lists, the PEG parser is still about an order of magnitude slower than the leex/yecc version. Talking to Neotoma's author Sean Cribbs, it sounds like Neotoma might be doing an excessive amount of unnecessary copying internally. If PEGs pique your interest and you know a bit of Erlang you might take a peek at Neotoma and see if you can find some potential performance optimizations.

I would still like to merge the peg branch of Reia, and the PEG grammar fixes a lot of quirks in the present yecc grammar, but I'm still holding out until the performance is improved to closer to the leex/yecc speeds.

Tuesday, March 22, 2011

Distributed systems and dynamic typing

There's a very good blog post by Robert Harper, a CMU CS professor, called "Dynamic languages are static languages." It's very enlightening and I strongly encourage you to read it. If I understand it correctly, which I freely admit I may not, the general idea is that dynamic languages can be thought of as static languages that have a single all-encompassing type. In that regard, dynamic languages are a proper subset of static languages. If you think I misinterpreted his post and I'm confused, please flame me in the comments.

In a previous post, called "Parallelism is not concurrency", he opines on a pet peeve of mine, namely how the terms parallelism and concurrency are nonchalantly and incorrectly interchanged. Parallelism applies to deterministic operations that operate on similar data in similar time. Some examples of parallel operations include rendering of 3D scenes on GPUs, or encoding/decoding blocks of compressed video. Concurrency, on the other hand, refers to the nondeterministic manner in which distributed systems operate, particularly ones where CPUs are separated over an unreliable network which can lose connectivity at any time.

In a third blog post, "Teaching FP to freshmen", Robert says he'll be teaching Standard ML to freshmen, and further argues he won't be teaching Object Oriented Programming because it's "both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum."

Three very interesting blog posts by the same person, clearly a well-educated computer science professor who knows more than I do. I mean, I like programming languages and I'm working on my own dynamically typed functional programming language, but Robert Harper wrote a book, one I certainly couldn't write. So who am I to judge?

Parallelism and Concurrency are Both Important

What I see missing from Robert Harper's writing is any attention paid to concurrency. He pays intense attention to parallelism, recognizes parallelism as important for the future, and strongly advocates functional languages as a great way to address the problems of parallelism. I have absolutely no bone to pick with him here and wish him great luck with his courses at CMU which address this problem domain.

However, my interests primarily lie in the realm of concurrent computing, and particularly in the area of distributed computing. In the area of distributed computing I find dynamic languages particularly important and think Robert Harper's article on static vs. dynamic languages omits some of the advantages of dynamic languages which make them work better in distributed systems.

This is a weighty subject, and one in which I don't think my own ideas alone make a particularly cogent argument. For some opening remarks, I will defer to Joe Armstrong, creator of the Erlang programming language, and his opinion from the book "Coders at Work" by Peter Seibel. As a bit of context to this quote, Joe is discussing the potential advantages that a static type system might confer upon Erlang. But then he gets to the disadvantages...
...the static type people say, "Well, we really rather like the benefits of dynamic types when we're marshaling data structures." We can't send an arbitrary program down a wire and reconstruct it at the other end because we need to know the type. And we have—Cardelli called it a system that's permanently inconsistent. We have systems that are growing and changing all the time, where the parts may be temporarily inconsistent. And as I change the code in a system, it's not atomic. Some of the nodes change, others don't. They talk to each other—at certain times they're consistent. At other times—when we go over a communication boundary—do we trust that the boundary is correct?
Type Systems and the CAP Theorem

There are two particularly sticky problems when it comes to the use of type in distributed systems. The first is the issue of serialization, or marshaling, of particular states. One way or another this is a solvable problem, both for statically typed and dynamically typed languages. I really don't want to delve too deep into this issue as it distracts from the larger point I'm trying to make, but in general, I think this is an easier problem to solve in dynamic type systems. I would also like to note that serialization formats which vomit their types all over the protocol and the codebase are outgrowths of static type systems. I'm looking at you, CORBA, SOAP, Protocol Buffers, and Thrift. On the flip side, systems which choose a minimal, semi-universal type system, such as JSON, BSON, BERT, and Msgpack, are all outgrowths of dynamic type systems. If I have a point to make here, I think it's that these systems are outgrowths of two very different ways of thinking about the same problem.

Marshaling is still a very important topic in distributed systems. Erlang largely abstracts this problem away from the programmer, allowing distributed nodes to exchange Erlang "terms" between processes on distributed nodes in the exact same way one would exchange messages between two processes located on the same host. The process of serializing that data, transmitting it over the network, receiving it via TCP, decoding it, and sending it to the appropriate Erlang process, is completely transparent to the end user. This is an extraordinarily powerful abstraction.

While statically typed languages can attempt to marshal data in a minimalistic JSON-like type system, this typically isn't the case. Instead, statically typed languages generally seem to prefer to vomit their types all over the protocol. The boilerplate code needed to marshal/unmarshal particular types can be generated automatically by a declaration of the types and methods of a particular protocol, such as the WSDL files used by SOAP. Again, users of statically typed languages could reduce the state of particular entities in their system to one which could fit into a minimalistic type system, but for static languages this is still a manual process, or one which requires manual code generation. In a language like Erlang which is built from the ground up to be distributed, dynamic, and operate around a minimalistic type system, serialization and deserialization can happen completely automatically.

Why is this important in a distributed system? Because, to paraphrase Joe Armstrong, distributed systems are messy. Imagine an Erlang-like distributed system that's statically typed. In order for such a system to work effectively, all nodes in the system must have the exact same code loaded and therefore have a universal consensus on what the types in the system are. This has huge implications on the theoretical properties on such a system. In order for a distributed system to agree on the nature of all types, it must be consistent.

However, if you're familiar with the CAP theorem, you may recognize the inherent problem this may cause. The CAP theorem gives you three options: a consistent fault-tolerant system, a highly available fault-tolerant system, or a consistent highly available system which breaks at the first sign of a fault. Only two of these options provide the consistency needed to ensure universal agreement on the types in the system such that automatic marshaling/unmarshaling of static types is even possible. In a distributed system, you either must give up universal agreement on the types, or sacrifice availability.

To quote Joe again, distributed systems are composed of parts which are "growing and changing all the time" with "parts may be temporarily inconsistent." While there aren't any guarantees that distributed systems built around dynamic type systems will work, inconsistent statically typed systems with disagreements about types are guaranteed not to work. Dynamic systems not only provide the possibility that your system may continue to function with different versions of the code loaded at the same time, but the ability for the programmer to plan for this contingency and offer ways to mitigate it. It's possible this may result in errors, but it may work, whereas incompatible type definitions are universally guaranteed to create errors. In a distributed environment, dynamic type systems provide extensibility, whereas static type systems actively seek to preclude it.

Something I've seen time and time again in systems like SOAP and even Thrift and Protocol buffers is attempts by programmers to escape the constraints of the type system, which almost universally fall into proprietary ways to store key/value pairs. One SOAP API I'm working with now provides "Maps" with "Entry" tags that have a key attribute and an associated value. Another implementation provides an array of "NameValuePair" objects. These solutions seem ugly, but in my opinion, their intentions are not. These are people seeking to extend running systems without the need to completely redefine the protocol. That's very much a practical concern.

Distributed Applications Must Be Flexible

In order for distributed programming to work effectively, nodes need to be able to call functions on each other without the need for programmers to write custom marshaling/demarshaling code for each type. The marshaled data needs to work extensibly, so that nodes running different versions of the code can still talk to each other in a forwards and backwards compatible manner.

Protocols will change over time. Older versions of the code need to work with a newer protocol, and vice versa, older versions of the protocol need to work with newer code. Nodes should be upgraded as practicality dictates. Perhaps your system administrator begins an upgrade and you lose access to a datacenter, because a janitor at your upstream ISP spilled a bucket of mopwater all over their datacenter's raised floor and caused a huge electrical disaster. Now your distributed application is running two different versions of the code, and it's suffered a network partition. Does this mean it should just break when the network partition is fixed?

Erlang has shown us it's possible to recover from these kinds of conditions. Even when we can't change code atomically across the entirety of a distributed application, it should still continue to operate without breaking. Distributed systems should be able to grow and change all the time without rigid boundaries.

Tuesday, March 15, 2011

Relativity for Programmers (A.K.A. Oh CAP!!!)

When you first start to learn physics you typically learn Newtonian Mechanics. Objects have inertia. Force is mass times acceleration. For every action there is an equal and opposite reaction. These all seem straightforward.

Albert Einstein turned the world on its head. Everything is relative. Time is relative. Simultaneity is relative. There is no universal clock.

The same thing happened in the computer science world, not with a bang or the attention that was given relativistic physics, but with a small crowd paying attention. CAP Theorem has the same implications for computer scientists as relativity had for physicists, but no one is really paying attention.

Computer scientists may dream of a fully consistent database distributed across a network, capable of two-phase commit and typical consistency guarantees. Not many think about the implications of such a database.

Imagine you had your database split across two datacenters. Let's call them Foo and Bar. While at one point in time, Foo and Bar could talk, now they can't. Perhaps a clumsy janitor spilled the bucket for his mop, and it destroyed all of your routing hardware between your datacenters. Some customers are accessing Foo and some are accessing Bar. If they want to perform a read, they'll get data that's possibly stale. If they want to perform a write, what happens?

CAP Theorem

CAP Theorem is possibly the most important property of distributed systems that you can possibly understand. It's been motherfucking proven by computer scientists, bitches! So what does it state? The core idea is that consistency which is both instantaneous and global is impossible, except on a single node. As soon as you have more than node storing data for you, you either have to sacrifice availability or consistency. Much like how you can only infer the state of the universe from old light beams hitting your light cone in relativistic physics, distributed systems can only infer the state of other nodes via the protocols they communicate with. These protocols can fall into one of three different forms per the CAP theorem:
  • CP: we have masters you have to listen to, bitch slaves. You can try to change the cluster state but all changes have to go through the masters, and if your master is split, you can't modify any state you bitch submissive node
  • AP: sure, we'll accept reads and writes. We'll try to eventually resolve any conflicts in the event of a netsplit. If we're cool we'll let you specify your own conflict resolution mechanism in the event of conflicting writes, otherwise we'll just do last writer wins.
  • CA: the network broke? Whoa. I was totally not expecting that. I was just happily doing bongrips until you barged in. I'm so high I don't even know what's going on.
If you're writing a datastore you consider to be "CA", you're doing it wrong. Fortunately, most datastores that claim to have "CA" properties are actually CP. That is to say, in the event of a network partition, they sacrifice availability. Reads to the cluster hopefully work, but writes shouldn't. Writes should error or time out for a consistent system. Any system that functions this way is in fact "CP". That is to say: it provides consistency, but in the event of a netsplit, you lose availability, at least for writes. Ideally these systems still provide stale reads.

Calling your system CA is an admission of the fact that if a partition separation occurs, and due to writes each partition has a separate view of the world, when the netsplit resolves your partitions have no mechanism for resolving conflicting states between partitions and thus your datastore is just fucked. Short of a system administrator picking which view of the world they consider to be the best and blowing away the state on the other nodes, the system state is going to be inconsistent and irresolvable.

So what are the alternatives which are partition tolerant and don't break in the event that the janitor spills mopwater on the router that routes packets between your datacenters?

Option #1: sacrifice consistency for availability

This option lets clients read and write from any partition. When connectivity between partitions is restored, there's a plan for resolving all of the writes into a consistent world state. The simplest plan is last writer wins, which is used by Cassandra. Whoever was the last one to alter the cluster state provided the definitive cluster state, simply by virtue of writing last.

In an eventual consistency model, clients may have inconsistent views of the world. This is unfortunately a fact of life. State may unexpectedly change by a particular client's view, because it's being modified on a different partition and those partitions managed to sync up. The road ahead of you may be a beach, but after syncing up with another part of the world, it may turn into a cactus. Such is life.

Option #2: sacrifice availability for consistency

Say you have a master/slave data store, and someone writes to a slave. Wat do? Slave can't talk to master. What's the most obvious option? Error.

This is a "CP" system. Slaves can serve stale state, but if you want to change the state, you have to talk to the master for that state. State can be partitioned/shareded/durr between multiple masters, but if the master for the particular state a slave wants to modify isn't available, sorry, that's a paddlin', or rather an error.

Option #3: highly available, and consistent at the partition level, netsplit fixed wat???

The only other option, the "CA" option, is that each partition pretends that writes to that partition are consistent, but when the netsplits are resolved, there's no consistent way to resolve that state. That is to say, if netsplits occur, each partition is highly available and allows writes, but as soon as the netsplit resolves, the cluster state is broke and irreconcilable.

In a quick survey of NoSQL datastores, Membase, MongoDB, and the hypothetical Redis Cluster all claim to be "CA". Are they really? Or do they just sacrifice availability in the event of a netsplit? If that's true, they're actually "CP", not "CA".

You really don't want to sacrifice partition tolerance. In fact, some might go as far as to say you CAN'T sacrifice partition tolerance. If you think your system is "CA" you probably understand the the CAP theorem and its implications.

CAP is the law, and AP or CP are the only reasonable options 

If you're running an app on a distributed network, things are going to go wrong. You might idealistically claim that you can provide both consistency and availability, sacrificing partition tolerance. You can't.


Can you pick "CA"? Sure, if you're content with losing all your data if the network had any errors. That's probably not what you mean. If you're building software for a fault-tolerant world, it's probably either going to be:
  • Consistent but unavailable during netsplits (CP)
  • "Eventually" consistent but available during netsplits (AP)
The only other option is "broken after netsplits". The best option is probably AP: this is the approach was pioneered by the Amazon Dynamo system which powered Amazon's shopping cart, and by copycat open source systems like Riak and Cassandra, not that Riak and Cassandra aren't awesome too.

Thursday, October 14, 2010

How to make a cheap CI traffic light

We use continuous integration (CI) testing where I work, using Hudson as our CI server. I saw this recent article on the social impacts of making the CI testing status more visible and how that encourages people to fix things that are broken. One way to improve the visibility beyond just having a monitor displaying the CI server status page is to add a large traffic light displaying the status of your CI server. This is an idea I first heard of being used at Digg and have later seen at companies like Github. Below is the Hudson status setup we have at my workplace, showing both the traffic light and the monitor displaying the build status page:


Green means everything built successfully. Yellow means a build is in progress. Red means something is broken. It's a very effective way to let everyone in the office know the status of your builds and encourage people to fix them!

Believe it or not, adding a traffic light to your CI setup is simple, relatively easy, and cheap! Also, if you're like me occasionally breaking into electronics and soldering things is fun! Here's what you'll need to build your own CI traffic light:
  • Lava Lite Traffic Light from Amazon ($19.99)
  • A computer running Linux with a parallel port
  • 4-conductor wire (one for each lamp + ground)
  • A DB-25 connector
  • Soldering supplies (soldering iron, solder, desolder braid, etc)
I bought the Lava Lite Traffic Light based on some reviews from people who used it as a Toastmasters timing light (to let people know when their time speaking is up):


By default it blinks in what the box claims is a "random" pattern but is actually a fixed pattern of red, yellow, green, red, yellow, green. But after opening up the traffic light itself, which only requires a few screws, I found out it was extremely easy to modify.

By default it's controlled by a single IC which is mounted on its own daughterboard, soldered to a small control board which contains the requisite transistors and resistors for powering the lamps and switching them on and off. The first thing you'll need to do is unscrew the control board and desolder the daughterboard from it. Desolder braid is your friend here.

The next step will be following the traces from the lamps back to where the daughterboard is mounted. This is fairly straightforward as the components involved in switching the lamps on and off are in threes. Looking at the control board you should very quickly be able to identify these components. There are three larger blue resistors which limit the current going to the lamps. Don't touch those! There are three transistors which are used to turn the lamps on and off. And finally there are three smaller brown resistors... follow the traces from these resistors to where they used to attach to the daughterboard. These are at TTL voltage, which is good, because so is a parallel port. What we'll be doing is soldering wires onto these three connectors and wiring them directly into a parallel port:


I'm afraid I don't have any fancy schematics for how exactly to do this, but it shouldn't be too hard! Here's a photo of where I soldered onto the control board. You can see the slot in the lower right corner where the daughterboard for the control IC used to be. I used a green wire for the green light, red for red, white for yellow, and black for ground. The circuits that go to ground are large and well labeled, so it should be fairly straightforward to find the points to solder onto:


If you've soldered correctly, you should be able to short any of the lamp wires (or a combination thereof) with the ground wire and they should turn on.

On the other side of your wire, you'll be wanting to solder on a DB25 connector. Wire the three wires which control the lamps to data lines 0, 1, and 2, and the ground to any of the many ground pins available on a parallel port. Here's a picture of mine:


And here's a handy dandy parallel port pinout diagram:


You'll be wanting to solder onto D0, D2, D2, and probably pin 18 (ground). I put green on D0, yellow on D1, and red on D2.

Next, you'll need some software to control the traffic light over the parallel port. To control our traffic light, I wrote a simple UDP server in C that receives 1-byte packets and writes them out to the parallel port. I originally wrote the server in Ruby but garbage collection was hanging up some fun light shows I was trying to put on, so I rewrote it in C for moar realtime. Here is the C source code to the traffic light server:


This uses the /dev/port character device on Linux, so you'll either need to run this server as root or a user specifically configured to have permissions to /dev/port. One thing I soon discovered was that the parallel port is stateful and will remain in the last state you set it to. This means you don't constantly have to write to the parallel port to keep a lamp on. Instead you can just set a state and the parallel port will remain in that state. Another thing I discovered that I still can't explain is that for some reason I needed to write the same byte twice to get the light to show the appropriate state. I don't know why, but it works!

Next, you'll need a client for your traffic light server. Here's a simple one I wrote in Ruby:


Just wait until this script gets passed around the office. Believe me from experience, everyone will go nuts for the first few days playing with the traffic light. Setting the light to an individual color is all, but for a true acid test here's a script I wrote which calls the above one and will make your traffic light rave balls!


But we didn't build this traffic light just to make it rave balls, did we? No, back to business, we need to get it to show the status of the Hudson build server. Here are some examples you can go off of which we're using now. It's a hacked together collection of scripts, to be sure, but it gets the job done:


Just set your Hudson server address (and username and password, if you protect it with HTTP basic auth like we do) and you're ready to go.

With all those scripts combined and a quick cron job to periodically poll Hudson's status, the traffic light will automatically display your build status and let everyone in your office know how diligent you've been about keeping your tests passing.

Now, get to work and build your own traffic light!

Tuesday, August 10, 2010

Multithreaded Rails is generally better than Async Rails, and Rainbows is cooler than Node

Once upon a time Rails was single threaded and could only process one request at a time. This meant for each concurrent request you wanted to process with Rails, you needed to run an entirely separate Ruby VM instance. This was not a good state of affairs, especially in cases where your Rails application was blocking on I/O when talking to a database or other external service. An entire instance of your application sat there useless as it was waiting for I/O to complete.

The lack of multithreading in Rails would lead Ezra Zygmuntowicz to write Merb, a thread-safe web framework for Ruby which certainly borrowed conceptually from Rails and would go on to serve as the core for the upcoming Rails 3.0 release. In the meantime, the earlier Rails 2.x branch would get support for a thread safe mode as well. This meant that web applications written in Ruby could process multiple requests using a single VM instance: while one thread was blocking on a response from a database or other service, the web application could continue processing other requests in other threads.

Even better, while Ruby originally started out with a "green threads" implementation which executed threads in userspace and could not provide multicore concurrency, newer, more modern Ruby implementations emerged which provided true native multithreading. JRuby and IronRuby, implementations of Ruby on the JVM and .NET CLR respectively, provided truly concurrent native multithreading while still maintaining Ruby's original threading API. Rubinius, a clean-room implementation of a Ruby VM based on the Smalltalk 80 architecture, has started to take steps to remove its global lock and allow concurrent multithreading as well.

With a multithreaded web framework like Merb, recent versions of Rails 2.x, or Rails 3.0, in conjunction with a Ruby VM that supports concurrent multithreading, you now need to only run one VM instance with a copy of your web application and it can utilize all available CPU cores in a server, providing true concurrent computation of Ruby code. No longer do you need a "pack of Mongrels" to serve your Rails application. Instead, you can just run a single VM and it will utilize all available system resources. This has enormous benefits in terms of ease-of-deployment, monitoring, and memory usage.

Ruby on Rails has finally grown up and works just like web applications in other more popular languages. You can run just one copy of any Ruby VM that supports native multithreading and utilize all available server resources. Rails deployment is no longer a hack. It Just Works.

But Wait, Threads Are Bad, And Async Is The New Hotness!





Threads have typically had a rather mired reputation in the programming world.  Threads utilize shared state by default and don't exactly provide the greatest mechanisms for synchronizing bits of shared state.  They're a leaky abstraction, and without eternal vigilance on the part of an entire development team and an excellent understanding of what's happening when you use thread synchronization mechanisms, sharing state between threads is error-prone and often difficult to debug.

The "threads are bad" cargo cult has often lead people to pursue "interesting" solutions to various concurrency problems in order to avoid using threads.  Event-based concurrent I/O became an incredibly popular solution for writing network servers, an approach seen in libraries like libevent, libev, Python's Twisted, and in the Ruby world EventMachine and my own event library, Rev.  This scheme uses a callback-driven approach, often with a central reactor core, dispatching incoming I/O asynchronously to various handlers.  For strictly I/O-bound applications, things like static file web servers, proxies, and protocol transformers, this approach is pretty much the best game in town.

Node.js, a pretty awesome I/O layer for Google's V8 JavaScript interpreter, is something of the new hotness.  It's certainly opened up the evented I/O approach to a new audience, and for I/O-bound tasks it provides a way to script in a dynamic language while remaining quite fast.  But as others have noted, Node is a bit overhyped. If you write your server in Node, will it scale? It really depends on the exact nature of the problem.  I'll get into that in a bit.

Ilya Grigorik recently presented at RailsConf and OSCON about em-synchrony, a set of "drivers" for EventMachine which facilitate various types of network I/O which present a synchronous interface but use Fibers to perform I/O asynchronously in the background. He had some rather impressive things to share there, including Rails running on top of EventMachine, dispatching requests concurrently using fibers instead of threads.  This approach won't provide you the computational concurrency that truly multithreaded Rails as in JRuby and IronRuby (and Rubinius soon!), but it will provide you wicked fast I/O performance... at a price.

The New Contract



Programmers generally live in a synchronous world. We call functions which return values. That's the status quo. Some languages go so far as to make this the only possible option. Evented frameworks do not work like this. Evented frameworks turn the world upside down.  For example, in Ruby, where you might ordinarily write something like:

response = connection.request params

In async land, you first have to initiate the request:

begin_request params

Then define a callback in order to receive the response:

def on_response(response)
  ...
end

Rather than calling functions, you initiate side effects which will eventually call one of a number of callbacks.  Exceptions no longer work. The context is lost between callbacks; you always start from just your arguments and have to figure out exactly what you were up to before, which generally necessitates breaking anything complex down into a finite state machine, instead of say, an imperative list of I/O commands to perform. It's a very different approach from the status quo.

The em-synchrony approach promises to save you from this by wrapping up all that ugly callback driven stuff with Fibers. I've been down that road and I no longer recommend it.  In January 2008 I wrote Revactor, a Erlang-like implementation of the Actor Model for Ruby 1.9, using Fibers as the underlying concurrency primitive. It's the first case known to me of someone using this approach, and significantly more powerful than any of the other available frameworks. Where em-synchrony makes you write Fiber-specific code for each network driver, Revactor exposed an incomplete duck type of Ruby's own TCPSocket, which means that patching drivers becomes significantly easier as you don't need asynchronous drivers to begin with.

However, for the most part I stopped maintaining Revactor, largely because I began to think the entire approach is flawed. The problem is frameworks like Revactor and em-synchrony impose a new contract on you: evented I/O only! You aren't allowed to use anything that does any kind of blocking I/O in your system anywhere, or you will hang the entire event loop. This approach works great for something like Node.js, where the entire system was written from the ground-up to be asynchronous, in a language which has a heritage of being asynchronous to begin with.

Not so in Ruby. There are tons and tons of libraries that do synchronous I/O. If you choose to use async Rails, you can't use any library which hasn't specifically been patched with em-synchrony-like async-to-Fiber thunks. Since most libraries haven't been patched with this code, you're cutting yourself off from the overwhelming majority of I/O libraries available. This problem is compounded by the fact that the only type of applications which will benefit from the async approach more than the multithreaded approach are ones that do a lot of I/O.

This is a problem you have to be eternally vigilant about what libraries you use and make absolutely sure nothing ever blocks ever. Hmm, is this beginning to sound like it may actually be as problematic as threads? And one more thing: exceptions. Dealing with exceptions in an asynchronous environment is very difficult, since control is inverted and exceptions don't work in callback mode. Instead, for exceptions to work properly, all of the "Fibered" em-synchrony-like drivers must catch, pass along, and rethrow exceptions. This is left as an exercise to the driver writer.

Threads are Good


Threads are bad when they have to share data.  But when you have a web server handling multiple requests concurrently with threads, they really don't need to share any data at all.  When threads don't share any data, multithreading is completely transparent to the end user. There are a few gotchas in multithreaded Rails, such as some foibles with the initial code loading, but after you get multithreaded Rails going, you won't even notice the difference from using a single thread.  So what cases would Async Rails be better than multithreaded Rails for?  I/O bound cases. For many people the idea of an I/O bound application draws up the canonical Rails use case: a database-bound app.

"Most Rails apps are database bound!" says the Rails cargo cult, but in my experience, useful webapps do things.  That said, Async Rails will have its main benefits over multithreaded apps in scenarios where the application is primarily I/O bound, and a webapp which is little more than a proxy between a user and the database (your typical CRUD app) seems like an ideal use case.

What does the typical breakdown of time spent in various parts of your Rails app look like?  The conventional wisdom would say this:

But even this is deceiving, because models generally do things in addition to querying your database. So really, we need a breakdown of database access time.  Evented applications benefit from being bound on I/O with little computation, so for an Async Rails app this is the ideal use case:

Here our application does negligible computation in the models, views, and controllers, and instead spends all its time making database queries. This time can involve writing out requests, waiting while the database does its business, and consuming the response.

This picture is still a bit vague.  What exactly is going on during all that time spent doing database stuff?  Let's examine my own personal picture of a typical "read" case:



For non-trivial read cases, your app is probably spending a little bit of time doing I/O to make the REQuest, most of its time waiting for the database QueRY to do its magic, and then spending some time reading out the response.

But a key point here: your app is spending quite a bit of time doing nothing but waiting between the request and the response.  Async Rails doesn't benefit you here. It removes some of the overhead for using threads to manage an idle connection, but most kernels are pretty good about managing a lot of sleepy threads which are waiting to be awoken nowadays.

So even in this case, things aren't going to be much better over multithreaded apps, because your Rails app isn't actually spending a lot of time doing I/O, it's spending most of it's time waiting for the database to respond. However, let's examine a more typical use case of Rails:

Here our app is actually doing stuff! It's actually spending a significant amount of time computing, with some time spent doing I/O and a decent chunk spent just blocking until an external service responds. For this case, the multithreaded model benefits you best: all your existing Ruby tools will Just Work (provided they don't share state unsafely), and best of all, when running multithreaded on JRuby or IronRuby (or Rubinius soon!) you can run a single VM image, reduce RAM usage by sharing code between threads, and leverage the entire hardware stack in the way the CPU manufactures intended.

Why You Should Use JRuby

JRuby provides native multithreading along with one of the most compatible alternative Ruby implementations out there, lets you leverage the power of the JVM, which includes a great ecosystem of tools like VisualVM, a mature underlying implementation, some of the best performance available in the Ruby world, a diverse selection of garbage collectors, a significantly more mature ecosystem of available libraries (provided you want to wrap them via the pretty nifty Java Interface), and the potential to deploy your application without any native dependencies whatsoever. JRuby can also precompile all of your Ruby code into an obfuscated Java-like form, allowing you to ship enterprise versions to customers you're worried might steal your source code.  Best of all, when using JRuby you also get to use the incredibly badass database drivers available for JDBC, and get things like master/slave splits and failover handled completely transparently by JDBC. Truly concurrent request handling and awesome database drivers: on JRuby, it Just Works.

Why not use IronRuby? IronRuby also gives you native multithreading, but while JRuby has 3 full time developers working on it, IronRuby only has one. I don't want to say that IronRuby is dying, but in my opinion JRuby is a much better bet. Also, the JVM probably does a better job supporting the platforms of interest for running Rails applications, namely Linux.

Is Async Rails Useful? Kinda.

All that said, are there use cases Async Rails is good for? Sure! If your app is truly I/O bound, doing things like request proxying or a relatively minor amount of computation as compared to I/O (regex scraping comes to mind), Async Rails is awesome. So long as you don't "starve" the event loop doing too much computation, it could work out for you.

I'd really be curious about what kinds of Rails apps people are writing that are extremely I/O heavy though.  To me, I/O bound use cases are the sorts of things people look at using Node for. In those cases, I would definitely recommend you check out Rainbows instead of Async Rails or Node.  More on that later...

Why I Don't Like EventMachine, And Why You Should Use Rev (and Revactor) Instead

em-synchrony is built on EventMachine. EventMachine is a project I've been using and have contributed to since 2006. I really can't say I'm a fan. Rather than using Ruby's native I/O primitives, EventMachine reinvents everything. The reason for this is because its original author, Francis "Garbagecat" Cianfrocca, had his own libev(ent)-like library, called "EventMachine", which was written in C++. It did all of its own I/O internally, and rather than trying to map that onto Ruby I/O primitives, Francis just slapped a very poorly written Ruby API onto it, foregoing any compatibility with how Ruby does I/O. There's been a lot of work and refactoring since, but even so, it's not exactly the greatest codebase to work with.

While this may have been remedied since last I used EventMachine, a key part of the evented I/O contract is missing: a "write completion" callback indicating that EventMachine has emptied the write buffer for a particular connection. This has lead to many bugs in cases like when proxying from a fast writer to a slow reader, the entire message to be proxied is taken into memory. There are all sorts of special workarounds for common use cases, but that doesn't excuse this feature being missing from EventMachine's I/O model.

It's for these reasons that I wrote Rev, a Node-like evented I/O binding built on libev. Rev uses all of Ruby's own native I/O primitives, including Ruby's OpenSSL library. Rev sought to minimize the amount of native code in the implementation, with as much written in Ruby as possible. For this reason Rev is slower than EventMachine, however the only limiting factor is developer motivation to benchmark and rewrite the most important parts of Rev in C instead of Ruby. Rev was written from the ground up to perform well on Ruby 1.9, then subsequently backported to Ruby 1.8.

Rev implements a complete I/O contract including a write completion event which is used by Revactor's Revactor::TCP::Socket class to expose an incomplete duck type of Ruby's TCPSocket.  This should make monkeypatching existing libraries to use Revactor-style concurrency much easier.  Rather than doing all the em-synchrony-style Fiber thunking and exception shuffling yourself, it's solved once by Revactor::TCP::Socket, and you just pretend you're doing normal synchronous I/O.

Revactor comes with all sorts of goodies that people seem to ask for often. Its original application was for a web spider, which in early 2008 was sucking down and scanning regexes on over 30Mbps of data using four processes running on a quad core Xeon 2GHz. I'm sure it was, at the time, the fastest concurrent HTTP fetcher ever written in Ruby. Perhaps a bit poorly documented, this HTTP fetcher is part of the Revactor standard library, and exposes an easy-to-use synchronous API which scatters HTTP requests to a pool of actors and gathers them back to the caller, exposing simple callback-driven response handling. I hear people talking about how awesome that sort of thing is in Node, and I say to them: why not do it in Ruby?

Why Rainbows Is Cooler Than Node

Both Rev and Revactor-style concurrency are provided by Eric Wong's excellent Rainbows HTTP server. Rainbows lets you build apps which handle the same types of use cases as Node, except rather than having to write everything in upside async down world in JavaScript, using Revactor you can write normal synchronous Ruby code and have everything be evented underneath. Existing synchronous libraries for Ruby can be patched instead of rewritten or monkeypatched with gobs of Fiber/exception thunking methods.

Why write in asynchronous upside down world when you can write things synchronously? Why write in JavaScript when you can write in Ruby? Props to everyone who has worked on solutions to this problem,  and to Ilya for taking it to the mainstream, but in general, I think Rev and Revactor provide a better model for this sort of problem.

Why I Stopped Development on Rev and Revactor: Reia

A little over two years ago I practically stopped development on Rev and Revactor. Ever since discovering Erlang I thought of it as a language with great semantics but a very ugly face. I started making little text files prototyping a language with Ruby-like syntax that could be translated into Erlang. At the time I had outgrown my roots as an I/O obsessed programmer and got very interested in programming languages, how they work, and had a deep desire to make my own.

The result was Reia, a Ruby-like language which runs on top of the Erlang VM. I've been working on it for over two years and it's close to being ready! It's got blocks! It's got Ruby-like syntax! Everything is an (immutable) object! All of the core types are self-hosted in Reia. It's got a teensy standard library. Exceptions are kind of working. I'd say it's about 75% of the way to its initial release. Soon you'll be able to write CouchDB views with it.

Erlang's model provides the best compromise for writing apps which do a lot of I/O but also do a lot of computation as well. Erlang has an "evented" I/O server which talks to a worker pool, using a novel interpretation of the Actor model. Where the original Actor model was based on continuations and continuation passing, making it vulnerable to the same "stop the world" scenarios if anything blocks anywhere, Erlang chose to make its actors preemptive, more like threads but much faster because they run in userspace and don't need to make a lot of system calls.

Reia pursues Erlang's policy of immutable state systemwide. You cannot mutate state, period. This makes sharing state a lot easier, since you can share a piece of state knowing no other process can corrupt it. Erlang  uses a model very similar to Unix: shared-nothing processes which communicate by sending "messages" (or in the case of Unix, primitive text streams).  For more information on how Erlang is the evolution of the Unix model, check out my other blog post How To Properly Utilize Modern Computers, which spells out a lot of the same concepts I've discussed in this post more abstractly.  Proper utilization of modern computers is exactly what Reia seeks to do well.

Reia has been my labor of love for over two years. I'm sorry if Rev and Revactor have gone neglected, but it seems I may have just simply been ahead of my time with them, and only now is Ruby community interest in asynchronous programming piqued by things like Node and em-synchrony. I invite you to check out Rev, Revactor, and Reia, as well as fork them on Github and start contributing if you have any interest in doing advanced asynchronous programming on Ruby 1.9.

Tuesday, June 29, 2010

Reia: Pluggable Parsers

One stand-out quality of the Ruby community is a fascination with obtaining and manipulating Ruby parse trees.  Such a fascination exists in many languages, but it's particularly weird in Ruby because until Ruby 1.9 there was no first-class way to obtain a Ruby parse tree.  People went spelunking with C code into Ruby's internals, ripping the parse tree right out and exposing it back to the Ruby environment.  Eventually Ruby parsers were implemented in Ruby itself in various projects.  Yet it remains that while Ruby as a language seems to attract parse tree tinkerers, the language itself does not provide first-class ways to satisfy their needs.

I firmly believe that being able to obtain a parse tree for the programming language you're using is important and should be a first-class language feature.  To that end, Reia supports a String#parse method:

>> "2+2".parse()
=> [(:binary_op,1,:'+',(:integer,1,2),(:integer,1,2))]

This parses the "2+2" string as Reia source code.  The result might remind you a little bit of Lisp: it's a Reia parse tree.  Right now there aren't immediate uses for Reia parse trees, but I'd soon like to add an interface for compiling/executing them.  Erlang supports a feature called "parse transforms" which allow on-the-fly transformations of Erlang syntax.  I'd also like to add such a feature to Reia.

If String#parse were just used to parse Reia source code it'd be a bit of a waste.  However, it can be used for more than just that.  For example, parsing JSON (as of tonight):

>> '{"foo": [1,2,3], "bar": [4,5,6]}'.parse(:json)    
=> {"foo"=>[1,2,3],"bar"=>[4,5,6]}

After some recent problems dealing with JSON libraries in Ruby, I really felt JSON parsing should be part of the standard library.  With this syntax, it almost feels like JSON parsing is part of the core language.  Rubyists generally implement that sort of thing by monkeypatching the core types.  Reia lets anyone define their own String#parse method by defining special module names, with no modifications to the core types required (which Reia doesn't let you do anyway).

To better understand how this works, let's take a look at how Reia implements String#parse:

def parse(format)
  "#{format.to_s().capitalize()}Parser".to_module().parse(self)
end

Given a format of :foobar, String#parse will capitalize the argument into "Foobar", then look for a "FoobarParser" module to parse itself with.  This means anyone can add a parser to the language just by defining a module name that ends with "Parser" and has a parse method which accepts a string as an argument.

In short, anyone can add a parser to the language which can be called with a simple, elegant syntax.  No monkeypatching required.

Monday, June 28, 2010

How to Properly Utilize Modern Computers


Holy abstract concept, Batman! Computers are complicated things, especially modern networked ones filled with multiple CPU cores, and anyone professing to know a singular way to utilize them is truly a madman, or a genius, or a little bit of both... but before we can talk about modern computers we must first talk about computers as they used to be.

While this may look prehistoric, it actually happened at Burning Man

Long ago, programming languages were crap and programming was hard. Ken Thompson and Dennis Richie reinvented the way we think about computers by designing not only a new operating system but a new programming language to write that operating system in. It was an ambitious effort that has forever shaped modern computing. Some people don't appreciate it and wax philosophical about hypothetically superior solutions. Those people are retards. Unix rules. Get over it.

No, this isn't quite as good as having sex


Unix had a brilliant underlying philosophy: do one thing and do it well; use multiple processes to solve problems; use text streams as your interface.  The simplicity of the Unix model had a beautiful elegance to it and made it very easy to leverage host resources in a scalable manner.  Instead of writing big clunky monolithic applications, write several small programs that use text streams to talk to each other.  Then if your host just happens to have multiple processors, the kernel can handle the task of farming out multiple jobs to multiple CPUs.

Shells and scripting languages were created to provide the interface and glue to the underlying system utilities. Users could easily queue up a series of tools to analyze and digest text streams as they saw fit.  The interesting thing about this approach is that often times users of these sorts of utilities were performing pure functional programming.  Each utility acts as a function which accepts its input over a pipe and produces output which it sends over a pipe.
A pearl, not to be confused with Perl.  Perl is not a gem.

Perl fit into this ecosystem beautifully.  Perl was focused at making short scripts which work on text streams, while providing easy conversion back and forth between text streams and numbers, since often times the text stream processing you want to do in Unix involves some kind of math.  Perl is an extremely expressive language which allowed people to write far more powerful scripts than anything that had been seen in previous Unix scripting languages.  It was powerful, expressive, and whimsical.  Unfortunately, its whimsy would also be its demise.

Perl's approach didn't scale well to large applications.  The level of abstraction it provided was targeted at writing short scripts within the multiprocess Unix environment.  However, the tide was turning against the entire Unix philosophy.  Monolithic applications and application environments were soon to become the norm.
OHAI!!!


Java tried to abstract away the underlying operating system.  It was not easy to write Java programs that fit into the traditional Unix philosophy.  Java strongly prefers you talk directly to things in Java Land, and because of that they reinvented standard Unix tools like cron as Quartz. Rather that using the traditional Unix shared-nothing process model to leverage multiple CPUs, Java wants you to use threads.  If you write your entire application this way, you can deploy an application by running a single instance of the Java Virtual Machine and giving it all your system memory.  With a single instance of the JVM you can theoretically utilize all of your available system resources.

Java still got a lot of things wrong.  Threads are one problem (I'll get into that later).  Another is handling application upgrades.  Some environments tried to support hot code swapping, but this usually ended up leaking memory.  In general, the recommended approach for upgrading a Java application is going to be starting and stopping the JVM.  If you happen to be running a network server, such as, say, a web server, this means you have to wait for all clients to disconnect, or you have to shut down without completing their requests.  Depending on the nature of your network protocol, clients may continue to remain connected indefinitely, so upgrades for those types of services typically means mandatory outages.

EWWWWW!!!!

Unfortunately, both the Unix model and the multithreaded model have warts.

Unix doesn't exactly provide the greatest set of tools for managing multiple processes.  The interprocess signaling model used to manage processes left an awful lot to be desired.  The pipe mechanism used for interprocess communication is rather primitive.  Requiring everything be serialized to text streams incurs a lot of overhead, especially when you write several programs in the same language and can use more efficient data structures than text to communicate data.

In that regard, there are a lot of incentives towards moving to something like Java for concurrent programming.  However, threads have warts too.

The semantics are just plain confusing and the possibility error is huge.  There are a set of best practices which mostly come down to the overriding concern: don't share state between threads.  As long as you never share state between threads there is never any concern over data corruption in concurrent programs.  However, many multithreaded programs share state all over the place, using a collection of highly error-prone synchronization mechanisms to try to keep everything kosher.  However, if you happen to forget to synchronize access to any given piece of shared state, you're screwed, you've just encountered a threading bug.  Sharing state between threads requires extreme vigilance on the part of the programmer, and also intimate knowledge about how threads work and their possible caveats.

Beyond all this, threads are managed by the kernel, and talking to the kernel has high overhead.  A truly amazing feat would be to soup up the Unix model and build your system using lots of shared-nothing processes that communicate using messages and mailboxes rather than primitive text streams.  This is exactly the approach that was taken by Erlang.

I AM ERLANG!!!

Erlang took the whole Unix philosophy to the next level.  Erlang process work like Unix processes, except they use mailboxes and messages instead of pipes.  Unlike threads, Erlang processes run in userspace, which makes them relatively fast.  You can create new Erlang processes a lot faster than you can create threads.  The Erlang VM can run one kernel thread per CPU on your system and load balance processes.  Code can be hot-swapped at runtime in a well-defined manner with extremely consistent semantics.  The entire language philosophy emphasizes the creation of distributed, fault-tolerant, self-healing programs which are able to not only leverage an entire computer, but leverage an entire network of connected computers, using a philosophy which is similar to but an improvement on the Unix approach.

In Erlang, all state is immutable.  This completely eliminates the problems of sharing state between threads.  Due to the way the language is designed it is simply not possible.  This opens up possibilities for Erlang language implementers to safely share state across threads, since the data can't be mutated.  Unfortunately attempts at using this approach in the present Erlang virtual machine have not yet lead to significant performance benefits.

Erlang has its own warts.  For everything it gets right semantically, it is still an aesthetically ugly language.  Very few would describe Erlang code as beautiful.  Despite claims that the semantics, and not the syntax are the barrier to learning Erlang, the main excuse I've heard from people who have avoided Erlang is that they don't like the syntax.
Clojure's logo is so awesome!

Clojure offers a different approach to leveraging modern multiprocessor computers.  It provides shared state that threads can work on transactionally, an approach called Software Transactional Memory (STM), which works kind of like a database.  When you aren't inside a transaction, all state is immutable, which means all state within the language is inherently "thread safe".

Because it's built for the JVM, Clojure is able to take advantage of all the previous effort put into an efficient native threads implementation for the Java programming language.  While this is great for utilizing multicore systems, it's still centered around the notion of shared state.  Distributing your program to multiple computers requires a conceptually different approach than you would ordinarily use to distribute a problem to multiple CPUs.

Beyond that, Clojure uses Lisp syntax.  While some people enjoy writing raw syntax trees because its "homoiconic" nature (not to be confused with Madonna or house music) means they can work all sorts of wonderful wizardry with macros, history has shown that in general most people are not really big fans of that sort of syntax.  Lisp has a lot of parens for a reason: because in most other languages those parens are implicit.


So what's the answer?  How do we "properly" utilize modern computers?  I don't have the answer, only an opinion.

The Unix model was great.  It just lacked a few features to really carry it over to distributed systems.  That said, I really like the idea of running a single VM like the JVM per host, and letting it consume all available system resources running a single application.

Erlang lets you do this, except it provides a Unix-like process model with many of the warts excised.  Erlang has excellent process management, and lets you interact with processes on remote nodes the same way you'd interact with the local system.  Erlang replaced the lousy pipe-based model of interprocess communication with messages, mailboxes, and even filters that allow you to selectively receive from your mailbox.

Erlang provides lightweight, shared-nothing userspace processes and a great way for them to communicate, as well as a scheduler that can dynamically load balance them between native threads and thus host CPUs.  Among many programming experts I've talked to there's a general consensus that having some sort of userspace concurrency context, be it a coroutine or a userspace thread, is a very handy construct to have.  Erlang, perhaps more than any other language out there, has wrapped up userspace concurrency contexts into a very neat little package.

I still feel Erlang's main drawback is its syntax, and I have a few ideas about that.  I think my language Reia brings with it the expressivity of a scripting language like Perl or Ruby combined with the awesome semantics of Erlang which allow it to easily utilize networks of multicore computers.  Reia can support the monolithic one-process-per-application approach so associated with Java while allowing developers to write multiprocess applications internally.  Reia is scripting evolved.