Friday, August 10, 2012

Debunking the Node.js Gish Gallop

A programmer who once a Ruby on Rails enthusiast switches to Node.js and thinks it's awesome, then proceeds to write a blog post about why Node is the bee's knees and Rails is crap. Attention is drawn to the changing nature of web design, from web pages with server-generated HTML to single-page JS-heavy apps written using Backbone, Ember, etc. Stop me if you think that you've heard this one before...

This is an argument I keep hearing over and over, and as far as I'm concerned it's nothing but a Gish Gallop of completely specious arguments, but I really worry... I worry because I keep hearing it over and over, and the fact that I keep hearing it over and over makes me worry that people are actually believing it. I don't know why I keep hearing it over and over. I'm not sure if people are running into problems, reading some of the prevailing "wisdom", and coming to the same conclusion or what. This really makes me sad, because whenever I read the posts like this, I do feel my previous passion for these same ideas, but for me that was half a lifetime ago, and my opinions have changed. I have been down these roads, over mountains, blazed my own trails, and then realized how stupid I was...

How do you defeat the Gish Gallop? I don't really enjoy doing this, but as far as I can tell there is no other way: we must go through the arguments one by one and show why they are completely ludicrous. So here we go...

In case you were confused, Rails is AWESOME for JSON APIs and single page applications

I love client-heavy HTML5/JS apps. I don't want every page on the web to be one, but there are many applications that can benefit a ton from keeping all of their state in the browser. In general: if you can do something without having to go across the network to do it, you will provide a better user experience, bar none.

The primary thing these applications crave are awesome JSON APIs (and Websockets... stay tuned). So why should you use Rails for a JSON API? Isn't Rails designed for HTML/JS pages? What benefit does Rails give you for building JSON APIs? And isn't Rails really slow?

Well no, I've been through this before. If you are building API-only applications with a single-page HTML5/JS frontend, you should definitely check out Rails::API. Rails::API completely eliminates any ActionView-centrism you may be worried about in Rails, and gives you awesome tools for building JSON APIs, like ActiveModel::Serializers. But that alone can't express what Rails brings to the table, so here as list of features Rails provides which are useful for JSON APIs, courtesy the Rails::API README:

Handled at the middleware layer:

  • Reloading: Rails applications support transparent reloading. This works even if your application gets big and restarting the server for every request becomes non-viable.
  • Development Mode: Rails application come with smart defaults for development, making development pleasant without compromising production-time performance.
  • Test Mode: Ditto test mode.
  • Logging: Rails applications log every request, with a level of verbosity appropriate for the current mode. Rails logs in development include information about the request environment, database queries, and basic performance information.
  • Security: Rails detects and thwarts IP spoofing attacks and handles cryptographic signatures in a timing attack aware way. Don't know what an IP spoofing attack or a timing attack is? Exactly.
  • Parameter Parsing: Want to specify your parameters as JSON instead of as a URL-encoded String? No problem. Rails will decode the JSON for you and make it available in params. Want to use nested URL-encoded params? That works too.
  • Conditional GETs: Rails handles conditional GET, (ETag and Last-Modified), processing request headers and returning the correct response headers and status code. All you need to do is use the stale? check in your controller, and Rails will handle all of the HTTP details for you.
  • Caching: If you use dirty? with public cache control, Rails will automatically cache your responses. You can easily configure the cache store.
  • HEAD requests: Rails will transparently convert HEAD requests into GET requests, and return just the headers on the way out. This makes HEAD work reliably in all Rails APIs.

Handled at the ActionPack layer:

  • Resourceful Routing: If you're building a RESTful JSON API, you want to be using the Rails router. Clean and conventional mapping from HTTP to controllers means not having to spend time thinking about how to model your API in terms of HTTP.
  • URL Generation: The flip side of routing is URL generation. A good API based on HTTP includes URLs (see the GitHub gist APIfor an example).
  • Header and Redirection Responses: head :no_content and redirect_to user_url(current_user) come in handy. Sure, you could manually add the response headers, but why?
  • Caching: Rails provides page, action and fragment caching. Fragment caching is especially helpful when building up a nested JSON object.
  • Basic, Digest and Token Authentication: Rails comes with out-of-the-box support for three kinds of HTTP authentication.
  • Instrumentation: Rails 3.0 added an instrumentation API that will trigger registered handlers for a variety of events, such as action processing, sending a file or data, redirection, and database queries. The payload of each event comes with relevant information (for the action processing event, the payload includes the controller, action, params, request format, request method and the request's full path).
  • Generators: This may be passé for advanced Rails users, but it can be nice to generate a resource and get your model, controller, test stubs, and routes created for you in a single command.
  • Plugins: Many third-party libraries come with support for Rails that reduces or eliminates the cost of setting up and gluing together the library and the web framework. This includes things like overriding default generators, adding rake tasks, and honoring Rails choices (like the logger and cache backend).
Rails has an unquestionably awesome feature set even if applied exclusively to JSON APIs, and this guy is taking it completely for granted:
"So your Rails server becomes an API, and your web site, like the iOS app, is the client. It's a clean separation of responsibilies, but given what Rails was designed to do, it's like having a horse rider climb on top of an elephant."
The design of Rails, as of Rails 1.2, provided clean abstractions for using the same code to provide server-generated HTML views and "REST" APIs in multiple serialization formats. This was a big deal at the time, and "the time" was 2 years before Node even existed. Fast forward 4 years and Rails 3 has been rewritten with an emphasis on modularization, allowing you to strip out the components you don't use and build lightweight stacks with only the things you need. Rails::API provides convention over configuration for a lightweight JSON-oriented stack.

But let me back up a little bit...
"The view in MVC is not just HTML and CSS; it's the presentation logic, and the presentation logic needs structure. With this need, client-side frameworks like Backbone, Spine, and Ember have come into the picture."
So I hear this guy Yehuda Katz worked on both Ember and Rails. You may have heard of Ember, it just won Throne of JS's framework of choice (Backbone won in the "library" category). But appeal to authority aside, what does using Ember and Rails in combination actually get you?

A problem I am certain you have run into is the manual nature of serializing JSON. Exactly how should you translate from a domain object into a JSON representation? What if the client wants to avoid repeat requests by eagerly loading other domain objects which are associated with the one you want to retrieve and including them in the JSON result? And wouldn't it be great if there were a single canonical representation for all of this that a standardized domain object abstraction running in the browser could automatically consume for us, so we don't have to manually write a bunch of JSON serialization and deserialization logic for everything in our system?

Can we put JSON on Rails? Yes we can: it's called ActiveModel::Serializers and Ember Data. All that glue code you've been writing over and over for serializing and unserializing JSON? Stop that. Seriously. You have better things to do than deal with the idiosyncrasies of whether you should wrap a particular array in an object or return a literal string or number as opposed to an object for future proofing. You are wasting your time with this minutiae and chances are the ActiveModel::Serializers representation is better than the one you are using. Let's take a look at why. 

The defining characteristics of the ActiveModel::Serializers JSON representation is that it explicitly avoids nesting objects within objects, instead preferring to keep the resulting structure flat and using IDs to correlate the relationships between data in the structure. Here is an example of a "post" object which includes comments and tags, taken from the ActiveModel::Serializers README:
{
  "post": {
    "id": 1,
    "title": "New post",
    "body": "A body!",
    "comments": [ 1, 2 ]
  },
  "comments": [
    { "id": 1, "body": "what a dumb post", "tags": [ 1, 2 ] },
    { "id": 2, "body": "i liked it", "tags": [ 1, 3 ] },
  ],
  "tags": [
    { "id": 1, "name": "short" },
    { "id": 2, "name": "whiny" },
    { "id": 3, "name": "happy" }
  ]
}
There are multiple nested relationships in this document: the post has many comments, and comments have many tags. And yet we don't see duplication of comment or tag objects. We don't have to worry about which version of a repeated object is canonical, because there are no repeated objects. Objects within the resulting document are deduplicated and referred to symbolically by their ID. Using this JSON structure we can represent arbitrarily nested relationships between objects in the most efficient manner possible and completely avoid any problems with inconsistencies between duplicated versions of objects present in the document. This representation of JSON just makes sense, and perhaps you too have standardized upon it. Better yet, if you use this representation, then with very little effort on your part Ember Data can automatically consume it.

If you use Ember and Rails, you can abstract away JSON and save yourself the headache of writing custom serialization code. I'm going to say: score one for Rails and single page applications. Maybe you have some Node thing that can do that too, I don't know, but seriously, if you think Rails is bad for JSON APIs, you don't know Rails.

Moving right along, let's continue slogging through the Gish Gallop.

Node has nonblocking async I/O and Rails doesn't so Rails is slow!!!

Where to start with this one. Hmm, let's start here:
"When I think of Ruby and Rails' performance, I think of Ilya Grigorik."
Let me start by saying that Ilya is an awesome guy who has done a very thorough and nuanced survey of the many facets of Ruby performance over time. Taking any single thing he's said out of context and treating it like gospel is probably doing a disservice to Ilya. That said, let's see what thing Ilya said that this guy chose to single out and present out of context. Quoth Ilya:
"There is nothing about node that can't be reproduced in Ruby or Python (EventMachine and Twisted), but the fact that the framework forces you to think and use the right components in place (fully async & non-blocking) is exactly why it is currently grabbing the mindshare of the early adopters. Rubyists, Pythonistas, and others can ignore this trend at their own peril. Moving forward, end-to-end performance and scalability of any framework will only become more important."
So this is a line I hear out of Ryan Dahl a lot too. It's a line I used to believe.

Folks, I've been doing this stuff for awhile. I first discovered synchronous I/O multiplexing when I was about 15, which for me was half a lifetime ago, and since then I've been building network servers using this approach. I've built my own abstraction layers across select/poll/epoll/kqueue. I wrapped libev for Ruby in Rev/Cool.io and nio4r, the latter of which is a cross-platform abstraction for Java NIO on JRuby. I cannot express to you how much work I've invested in doing things the evented non-blocking way.

I don't think non-blocking I/O is a good fit for web applications that talk HTTP, although I think it can be a good fit for Websocket applications. I will get to my reasons later. But first, let's continue digging through the Gish Gallop:
"Ilya mentioned the framework/ecosystem that I now consider to be the threat to Rails: Node.js [...] The biggest thing I noticed was the difference in performance. It consumed less memory than Ruby, and it served more requests per second than Sinatra or even Rack."
I have a huge pet peeve, and that's when people talk about performance without numbers. I tried it and it was faster. I tried it and it was slower. If you really want to make a point about the performance of a particular thing, can you at least pretend you're using science?

I hate to do this, but I think I have to destroy your god. Let's see how Ilya's software stacks up to mine on a crappy "hello world" web server benchmark. First, the numbers for my web server Reel:

# httperf --num-conns=50 --num-calls=1000

Ruby Version        Throughput    Latency
------------        ----------    -------
JRuby HEAD          5650 reqs/s   (0.2 ms/req)
Ruby 1.9.3          5263 reqs/s   (0.2 ms/req)
JRuby 1.6.7         4303 reqs/s   (0.2 ms/req)
rbx HEAD            2288 reqs/s   (0.4 ms/req)
Let's compare to Ilya's web server Goliath, as well as Thin and Node.js:
Web Server          Throughput    Latency
----------          ----------    -------
Goliath (0.9.4)     2058 reqs/s   (0.5 ms/req)
Thin    (1.2.11)    7502 reqs/s   (0.1 ms/req)
Node.js (0.6.5)     11735 reqs/s  (0.1 ms/req)
All of these servers, including mine, are using non-blocking evented I/O. Is that remotely relevant? No. That's just a coincidence.

My web server is faster than Ilya's. So by Gish Gallop logic, Ilya must be wrong about everything. There must be no reason to use Ilya's web server. Let's write everything in Node since it won the benchmark.

There's a huge problem here: Goliath does things that Reel, Thin, and Node's HTTP server don't do. The reason it's slower isn't because Ilya sucks and is clueless about performance. The reason is that Goliath has features which these other web servers don't, which makes it an apples to oranges comparison. (I guess scumbag me for putting them all in a big list on the Reel web page)

The same can be said of Rails: it probably isn't ever going to have better latency through the entire stack  than any Node.js framework, but the latency of the Rails stack is probably going to be a lot less than your application logic, and that's still going to be a drop in the bucket compared to the network latency to a given user.

Celluloid solves every single problem you're whining about better than Node

Node has a lot of problems, and I'm not just talking about the audience it attracts. Let me start by saying this: many of the things I have built in Celluloid are based off of technologies originally developed for Node. My web server Reel uses the Node HTTP parser, and it's quite likely that the next iteration of nio4r I develop will be based off of libuv.

All that said, let me start with Node's fundamental problem: callback-driven I/O. Celluloid::IO is one of many systems, including Erlang and Go, that demonstrate that "nonblocking" and "evented" I/O are orthogonal to callbacks. Celluloid uses Ruby's coroutine mechanism to provide a synchronous I/O API on top of an underlying nonblocking system. However, where systems like Node force you to use nonblocking I/O for everything, Celluloid lets you mix and match blocking and nonblocking I/O as your needs demand.

If you have ever worked in a language like C(++) or Java, you probably know an amazing property of sockets: you can mix and match blocking and nonblocking I/O, even over the lifecycle of a single socket. Perhaps you will handle incoming sockets in a nonblocking manner at first, but if they make a complex request, you might change the socket to a blocking mode and hand it off to a worker thread.

Celluloid::IO makes this handoff completely transparent: simply by giving the socket to another Ruby thread which isn't a Celluloid::IO actor, it will automatically switch from nonblocking to blocking mode completely transparently.

But let's talk about Node's real fundamental problem, one that is extremely difficult to solve in any callback-driven system: flow control. Unfortunately the Node.js community has adopted the phrase "flow control" to mean "building abstractions around managing callbacks", however the phrase "flow control" has a very specific definition relating to the rates at which data is transmitted between systems.

In general, callback-driven systems can't manage flow control effectively. The most notable pathological case is the producer-consumer problem, whereby a slow consumer might force a system like Node to unboundedly buffer data from an unchecked producer. There's a clear and simple solution to this problem: make all I/O synchronous. Using coroutines that provide blocking-style APIs, you can easily compose producer/consumer problems in a manner that doesn't result in unbounded writes to a buffer, because simply by virtue of a virtual blocking API, the rate at which data is transfered from producer to consumer is kept in check.

But what about WebSockets?

Ruby has had some pretty awesome albeit overlooked and therefore stagnant solutions for WebSockets for awhile, like Cramp. I've been working on web-based push technologies for half a decade now, and explored a multitude of solutions including Comet, XMPP/BOSH, RabbitMQ long polling, and my own XHR long polling systems which I originally built around *gasp* threads nearly 3 years ago at this point.

Well, I'm quite happy to say that Reel now supports WebSockets. I certainly don't want to say that my recent spike is anywhere as mature as WebSockets in Node or their surrounding ecosystem. Instead, I think the API that Reel provides for WebSocks is simply better by design. If you managed to catch tenderlove's recent blog post on streaming live data, you may understand that all previous APIs you may have encountered in both systems like Rails or Node for streaming data were really obscuring the one API that truly makes sense for this use case: a socket.

WebSockets are in many ways similar to 0MQ sockets (which are used in DCell via Celluloid::ZMQ). WebSockets provide a framing mechanism which provides a message-based transport instead of the typical stream-based transport provided by TCP. That said, when processing message sequences, callbacks become extremely problematic, because you must reconstruct the state of the current request from the point of each incoming message. Callbacks work well for e.g. a chat protocol where there is no state relationship between messages, but as soon as there is you are effectively stuck building a finite state machine to manage the processing of each incoming message.

This is madness. There's a much better and much more straightforward solution to this problem: just use the goddamn stack. In order to do so, you need to provide a "blocking" API, but this isn't orthogonal to using nonblocking I/O. Celluloid::IO, Go, and Erlang all let you build concurrent, multithreaded, and potentially multicore systems on top of coroutines spread across multiple native threads.

That said, native threads are cheap nowadays and they're only getting cheaper. On most Ruby VMs a native thread will cost you about 20kB of RAM. If you want you can just build blocking I/O systems completely out of native threads without using any sort of evented I/O, and these systems can scale up to tens of thousands of connections.

Don't believe the hype

Node provides a limited subset of what Ruby can do, and it can be done better with Ruby. Node does not have a web framework of the same caliber as Rails. Node doesn't have threads, which in Ruby will spare you from Node's callback soup. Finally, there's the elephant in the room: JavaScript is a terrible, terrible programming language compared to Ruby. We're forced to use JavaScript in the browser, but on the server, we can choose the best language for the job.

Ruby on Rails remains the best-in-class web framework, and while there are arguments to be made against it, the ones I hear coming out of confused Node.js detractors do not hold water.