Friday, August 10, 2012

Debunking the Node.js Gish Gallop

A programmer who once a Ruby on Rails enthusiast switches to Node.js and thinks it's awesome, then proceeds to write a blog post about why Node is the bee's knees and Rails is crap. Attention is drawn to the changing nature of web design, from web pages with server-generated HTML to single-page JS-heavy apps written using Backbone, Ember, etc. Stop me if you think that you've heard this one before...

This is an argument I keep hearing over and over, and as far as I'm concerned it's nothing but a Gish Gallop of completely specious arguments, but I really worry... I worry because I keep hearing it over and over, and the fact that I keep hearing it over and over makes me worry that people are actually believing it. I don't know why I keep hearing it over and over. I'm not sure if people are running into problems, reading some of the prevailing "wisdom", and coming to the same conclusion or what. This really makes me sad, because whenever I read the posts like this, I do feel my previous passion for these same ideas, but for me that was half a lifetime ago, and my opinions have changed. I have been down these roads, over mountains, blazed my own trails, and then realized how stupid I was...

How do you defeat the Gish Gallop? I don't really enjoy doing this, but as far as I can tell there is no other way: we must go through the arguments one by one and show why they are completely ludicrous. So here we go...

In case you were confused, Rails is AWESOME for JSON APIs and single page applications

I love client-heavy HTML5/JS apps. I don't want every page on the web to be one, but there are many applications that can benefit a ton from keeping all of their state in the browser. In general: if you can do something without having to go across the network to do it, you will provide a better user experience, bar none.

The primary thing these applications crave are awesome JSON APIs (and Websockets... stay tuned). So why should you use Rails for a JSON API? Isn't Rails designed for HTML/JS pages? What benefit does Rails give you for building JSON APIs? And isn't Rails really slow?

Well no, I've been through this before. If you are building API-only applications with a single-page HTML5/JS frontend, you should definitely check out Rails::API. Rails::API completely eliminates any ActionView-centrism you may be worried about in Rails, and gives you awesome tools for building JSON APIs, like ActiveModel::Serializers. But that alone can't express what Rails brings to the table, so here as list of features Rails provides which are useful for JSON APIs, courtesy the Rails::API README:

Handled at the middleware layer:

  • Reloading: Rails applications support transparent reloading. This works even if your application gets big and restarting the server for every request becomes non-viable.
  • Development Mode: Rails application come with smart defaults for development, making development pleasant without compromising production-time performance.
  • Test Mode: Ditto test mode.
  • Logging: Rails applications log every request, with a level of verbosity appropriate for the current mode. Rails logs in development include information about the request environment, database queries, and basic performance information.
  • Security: Rails detects and thwarts IP spoofing attacks and handles cryptographic signatures in a timing attack aware way. Don't know what an IP spoofing attack or a timing attack is? Exactly.
  • Parameter Parsing: Want to specify your parameters as JSON instead of as a URL-encoded String? No problem. Rails will decode the JSON for you and make it available in params. Want to use nested URL-encoded params? That works too.
  • Conditional GETs: Rails handles conditional GET, (ETag and Last-Modified), processing request headers and returning the correct response headers and status code. All you need to do is use the stale? check in your controller, and Rails will handle all of the HTTP details for you.
  • Caching: If you use dirty? with public cache control, Rails will automatically cache your responses. You can easily configure the cache store.
  • HEAD requests: Rails will transparently convert HEAD requests into GET requests, and return just the headers on the way out. This makes HEAD work reliably in all Rails APIs.

Handled at the ActionPack layer:

  • Resourceful Routing: If you're building a RESTful JSON API, you want to be using the Rails router. Clean and conventional mapping from HTTP to controllers means not having to spend time thinking about how to model your API in terms of HTTP.
  • URL Generation: The flip side of routing is URL generation. A good API based on HTTP includes URLs (see the GitHub gist APIfor an example).
  • Header and Redirection Responses: head :no_content and redirect_to user_url(current_user) come in handy. Sure, you could manually add the response headers, but why?
  • Caching: Rails provides page, action and fragment caching. Fragment caching is especially helpful when building up a nested JSON object.
  • Basic, Digest and Token Authentication: Rails comes with out-of-the-box support for three kinds of HTTP authentication.
  • Instrumentation: Rails 3.0 added an instrumentation API that will trigger registered handlers for a variety of events, such as action processing, sending a file or data, redirection, and database queries. The payload of each event comes with relevant information (for the action processing event, the payload includes the controller, action, params, request format, request method and the request's full path).
  • Generators: This may be passé for advanced Rails users, but it can be nice to generate a resource and get your model, controller, test stubs, and routes created for you in a single command.
  • Plugins: Many third-party libraries come with support for Rails that reduces or eliminates the cost of setting up and gluing together the library and the web framework. This includes things like overriding default generators, adding rake tasks, and honoring Rails choices (like the logger and cache backend).
Rails has an unquestionably awesome feature set even if applied exclusively to JSON APIs, and this guy is taking it completely for granted:
"So your Rails server becomes an API, and your web site, like the iOS app, is the client. It's a clean separation of responsibilies, but given what Rails was designed to do, it's like having a horse rider climb on top of an elephant."
The design of Rails, as of Rails 1.2, provided clean abstractions for using the same code to provide server-generated HTML views and "REST" APIs in multiple serialization formats. This was a big deal at the time, and "the time" was 2 years before Node even existed. Fast forward 4 years and Rails 3 has been rewritten with an emphasis on modularization, allowing you to strip out the components you don't use and build lightweight stacks with only the things you need. Rails::API provides convention over configuration for a lightweight JSON-oriented stack.

But let me back up a little bit...
"The view in MVC is not just HTML and CSS; it's the presentation logic, and the presentation logic needs structure. With this need, client-side frameworks like Backbone, Spine, and Ember have come into the picture."
So I hear this guy Yehuda Katz worked on both Ember and Rails. You may have heard of Ember, it just won Throne of JS's framework of choice (Backbone won in the "library" category). But appeal to authority aside, what does using Ember and Rails in combination actually get you?

A problem I am certain you have run into is the manual nature of serializing JSON. Exactly how should you translate from a domain object into a JSON representation? What if the client wants to avoid repeat requests by eagerly loading other domain objects which are associated with the one you want to retrieve and including them in the JSON result? And wouldn't it be great if there were a single canonical representation for all of this that a standardized domain object abstraction running in the browser could automatically consume for us, so we don't have to manually write a bunch of JSON serialization and deserialization logic for everything in our system?

Can we put JSON on Rails? Yes we can: it's called ActiveModel::Serializers and Ember Data. All that glue code you've been writing over and over for serializing and unserializing JSON? Stop that. Seriously. You have better things to do than deal with the idiosyncrasies of whether you should wrap a particular array in an object or return a literal string or number as opposed to an object for future proofing. You are wasting your time with this minutiae and chances are the ActiveModel::Serializers representation is better than the one you are using. Let's take a look at why. 

The defining characteristics of the ActiveModel::Serializers JSON representation is that it explicitly avoids nesting objects within objects, instead preferring to keep the resulting structure flat and using IDs to correlate the relationships between data in the structure. Here is an example of a "post" object which includes comments and tags, taken from the ActiveModel::Serializers README:
{
  "post": {
    "id": 1,
    "title": "New post",
    "body": "A body!",
    "comments": [ 1, 2 ]
  },
  "comments": [
    { "id": 1, "body": "what a dumb post", "tags": [ 1, 2 ] },
    { "id": 2, "body": "i liked it", "tags": [ 1, 3 ] },
  ],
  "tags": [
    { "id": 1, "name": "short" },
    { "id": 2, "name": "whiny" },
    { "id": 3, "name": "happy" }
  ]
}
There are multiple nested relationships in this document: the post has many comments, and comments have many tags. And yet we don't see duplication of comment or tag objects. We don't have to worry about which version of a repeated object is canonical, because there are no repeated objects. Objects within the resulting document are deduplicated and referred to symbolically by their ID. Using this JSON structure we can represent arbitrarily nested relationships between objects in the most efficient manner possible and completely avoid any problems with inconsistencies between duplicated versions of objects present in the document. This representation of JSON just makes sense, and perhaps you too have standardized upon it. Better yet, if you use this representation, then with very little effort on your part Ember Data can automatically consume it.

If you use Ember and Rails, you can abstract away JSON and save yourself the headache of writing custom serialization code. I'm going to say: score one for Rails and single page applications. Maybe you have some Node thing that can do that too, I don't know, but seriously, if you think Rails is bad for JSON APIs, you don't know Rails.

Moving right along, let's continue slogging through the Gish Gallop.

Node has nonblocking async I/O and Rails doesn't so Rails is slow!!!

Where to start with this one. Hmm, let's start here:
"When I think of Ruby and Rails' performance, I think of Ilya Grigorik."
Let me start by saying that Ilya is an awesome guy who has done a very thorough and nuanced survey of the many facets of Ruby performance over time. Taking any single thing he's said out of context and treating it like gospel is probably doing a disservice to Ilya. That said, let's see what thing Ilya said that this guy chose to single out and present out of context. Quoth Ilya:
"There is nothing about node that can't be reproduced in Ruby or Python (EventMachine and Twisted), but the fact that the framework forces you to think and use the right components in place (fully async & non-blocking) is exactly why it is currently grabbing the mindshare of the early adopters. Rubyists, Pythonistas, and others can ignore this trend at their own peril. Moving forward, end-to-end performance and scalability of any framework will only become more important."
So this is a line I hear out of Ryan Dahl a lot too. It's a line I used to believe.

Folks, I've been doing this stuff for awhile. I first discovered synchronous I/O multiplexing when I was about 15, which for me was half a lifetime ago, and since then I've been building network servers using this approach. I've built my own abstraction layers across select/poll/epoll/kqueue. I wrapped libev for Ruby in Rev/Cool.io and nio4r, the latter of which is a cross-platform abstraction for Java NIO on JRuby. I cannot express to you how much work I've invested in doing things the evented non-blocking way.

I don't think non-blocking I/O is a good fit for web applications that talk HTTP, although I think it can be a good fit for Websocket applications. I will get to my reasons later. But first, let's continue digging through the Gish Gallop:
"Ilya mentioned the framework/ecosystem that I now consider to be the threat to Rails: Node.js [...] The biggest thing I noticed was the difference in performance. It consumed less memory than Ruby, and it served more requests per second than Sinatra or even Rack."
I have a huge pet peeve, and that's when people talk about performance without numbers. I tried it and it was faster. I tried it and it was slower. If you really want to make a point about the performance of a particular thing, can you at least pretend you're using science?

I hate to do this, but I think I have to destroy your god. Let's see how Ilya's software stacks up to mine on a crappy "hello world" web server benchmark. First, the numbers for my web server Reel:

# httperf --num-conns=50 --num-calls=1000

Ruby Version        Throughput    Latency
------------        ----------    -------
JRuby HEAD          5650 reqs/s   (0.2 ms/req)
Ruby 1.9.3          5263 reqs/s   (0.2 ms/req)
JRuby 1.6.7         4303 reqs/s   (0.2 ms/req)
rbx HEAD            2288 reqs/s   (0.4 ms/req)
Let's compare to Ilya's web server Goliath, as well as Thin and Node.js:
Web Server          Throughput    Latency
----------          ----------    -------
Goliath (0.9.4)     2058 reqs/s   (0.5 ms/req)
Thin    (1.2.11)    7502 reqs/s   (0.1 ms/req)
Node.js (0.6.5)     11735 reqs/s  (0.1 ms/req)
All of these servers, including mine, are using non-blocking evented I/O. Is that remotely relevant? No. That's just a coincidence.

My web server is faster than Ilya's. So by Gish Gallop logic, Ilya must be wrong about everything. There must be no reason to use Ilya's web server. Let's write everything in Node since it won the benchmark.

There's a huge problem here: Goliath does things that Reel, Thin, and Node's HTTP server don't do. The reason it's slower isn't because Ilya sucks and is clueless about performance. The reason is that Goliath has features which these other web servers don't, which makes it an apples to oranges comparison. (I guess scumbag me for putting them all in a big list on the Reel web page)

The same can be said of Rails: it probably isn't ever going to have better latency through the entire stack  than any Node.js framework, but the latency of the Rails stack is probably going to be a lot less than your application logic, and that's still going to be a drop in the bucket compared to the network latency to a given user.

Celluloid solves every single problem you're whining about better than Node

Node has a lot of problems, and I'm not just talking about the audience it attracts. Let me start by saying this: many of the things I have built in Celluloid are based off of technologies originally developed for Node. My web server Reel uses the Node HTTP parser, and it's quite likely that the next iteration of nio4r I develop will be based off of libuv.

All that said, let me start with Node's fundamental problem: callback-driven I/O. Celluloid::IO is one of many systems, including Erlang and Go, that demonstrate that "nonblocking" and "evented" I/O are orthogonal to callbacks. Celluloid uses Ruby's coroutine mechanism to provide a synchronous I/O API on top of an underlying nonblocking system. However, where systems like Node force you to use nonblocking I/O for everything, Celluloid lets you mix and match blocking and nonblocking I/O as your needs demand.

If you have ever worked in a language like C(++) or Java, you probably know an amazing property of sockets: you can mix and match blocking and nonblocking I/O, even over the lifecycle of a single socket. Perhaps you will handle incoming sockets in a nonblocking manner at first, but if they make a complex request, you might change the socket to a blocking mode and hand it off to a worker thread.

Celluloid::IO makes this handoff completely transparent: simply by giving the socket to another Ruby thread which isn't a Celluloid::IO actor, it will automatically switch from nonblocking to blocking mode completely transparently.

But let's talk about Node's real fundamental problem, one that is extremely difficult to solve in any callback-driven system: flow control. Unfortunately the Node.js community has adopted the phrase "flow control" to mean "building abstractions around managing callbacks", however the phrase "flow control" has a very specific definition relating to the rates at which data is transmitted between systems.

In general, callback-driven systems can't manage flow control effectively. The most notable pathological case is the producer-consumer problem, whereby a slow consumer might force a system like Node to unboundedly buffer data from an unchecked producer. There's a clear and simple solution to this problem: make all I/O synchronous. Using coroutines that provide blocking-style APIs, you can easily compose producer/consumer problems in a manner that doesn't result in unbounded writes to a buffer, because simply by virtue of a virtual blocking API, the rate at which data is transfered from producer to consumer is kept in check.

But what about WebSockets?

Ruby has had some pretty awesome albeit overlooked and therefore stagnant solutions for WebSockets for awhile, like Cramp. I've been working on web-based push technologies for half a decade now, and explored a multitude of solutions including Comet, XMPP/BOSH, RabbitMQ long polling, and my own XHR long polling systems which I originally built around *gasp* threads nearly 3 years ago at this point.

Well, I'm quite happy to say that Reel now supports WebSockets. I certainly don't want to say that my recent spike is anywhere as mature as WebSockets in Node or their surrounding ecosystem. Instead, I think the API that Reel provides for WebSocks is simply better by design. If you managed to catch tenderlove's recent blog post on streaming live data, you may understand that all previous APIs you may have encountered in both systems like Rails or Node for streaming data were really obscuring the one API that truly makes sense for this use case: a socket.

WebSockets are in many ways similar to 0MQ sockets (which are used in DCell via Celluloid::ZMQ). WebSockets provide a framing mechanism which provides a message-based transport instead of the typical stream-based transport provided by TCP. That said, when processing message sequences, callbacks become extremely problematic, because you must reconstruct the state of the current request from the point of each incoming message. Callbacks work well for e.g. a chat protocol where there is no state relationship between messages, but as soon as there is you are effectively stuck building a finite state machine to manage the processing of each incoming message.

This is madness. There's a much better and much more straightforward solution to this problem: just use the goddamn stack. In order to do so, you need to provide a "blocking" API, but this isn't orthogonal to using nonblocking I/O. Celluloid::IO, Go, and Erlang all let you build concurrent, multithreaded, and potentially multicore systems on top of coroutines spread across multiple native threads.

That said, native threads are cheap nowadays and they're only getting cheaper. On most Ruby VMs a native thread will cost you about 20kB of RAM. If you want you can just build blocking I/O systems completely out of native threads without using any sort of evented I/O, and these systems can scale up to tens of thousands of connections.

Don't believe the hype

Node provides a limited subset of what Ruby can do, and it can be done better with Ruby. Node does not have a web framework of the same caliber as Rails. Node doesn't have threads, which in Ruby will spare you from Node's callback soup. Finally, there's the elephant in the room: JavaScript is a terrible, terrible programming language compared to Ruby. We're forced to use JavaScript in the browser, but on the server, we can choose the best language for the job.

Ruby on Rails remains the best-in-class web framework, and while there are arguments to be made against it, the ones I hear coming out of confused Node.js detractors do not hold water.

16 comments:

Haora said...

Even though I agree with you regarding your arguments on favor or Rails, I would do without your entire paragraph.. saying that "Ruby on Rails remains the best-in-class web framework" is a bit of a HUGE generalization, you're implying that it's the silver bullet of web frameworks.. that's never good.
Also, your hate for Javascript, is it really justified? I have a limited experience on the web, since I've only been working with it for the last 10 years (almost), but Javascript is a language that has received a lot of hate from people who don't really take the time to learn it properly... are you one of them?

Other than that, it was a great post, and the work you did to debunk all the gish gallop was great, keep it up and keep posting.

Xagile said...

Suggesting that the reason to use Node.js is performance is bullshit. I agree with you there.

However, I happen to like JavaScript about as much as I like Ruby, and there are advantages to working in the same language on both sides: like you can share modules, and you only have to know one language. Plus, if you really don't like JavaScript you can use CoffeeScript pretty much everywhere with just a few npm modules.

I think that Rails is an excellent platform and I plan to try out some of the other things you suggested. However, I also think that Node.js is pretty useful. It just needs some time to mature.

Thanks!

SubStack said...

In general, callback-driven systems can't manage flow control effectively. The most notable pathological case is the producer-consumer problem, whereby a slow consumer might force a system like Node to unboundedly buffer data from an unchecked producer.

It seems that you've never heard of streams in node. The great thing about streams is that they can handle the tcp backpressure problem very elegantly. Just do:

fs.createReadStream('bigfile.txt').pipe(res)

in an http server handler and the response handler will tell the readable stream to slow down if the remote client can't download the data fast enough.

The streaming api is all built with callbacks so I find your assertion that callbacks can't handle the backpressure problem to be really odd considering how obsessed the core node team is about doing backpressure correctly.

It will be even easier to write streaming abstractions with correct backpressure behavior in node 0.9.

MoJava Cafe said...

"However, where systems like Node force you to use nonblocking I/O for everything, Celluloid lets you mix and match blocking and nonblocking I/O as your needs demand."

Node does not force one to use non-blocking IO for everything.

Tony said...

@SubStack: The Streams API is great, except not everything uses it, thus you encounter the problem of inconsistent I/O semantics... the very problem Node was supposed to solve by making everything nonblocking.

@MoJave Cafe: sure, except you don't have additional threads to perform I/O in like you do in Ruby, so if you block, you lose concurrency and block the entire process.

Ege said...

You can write blocking code and put it on a forked process.

Node.js projects are easy to start with and becoming easier to maintain as the platform itself and the frameworks on top of it matures while being flexible enough.

Kevin McCaughey said...
This comment has been removed by the author.
Xagile said...

I think Rails is ideal for interacting with objects that are backed by structured data. There are all kinds of applications for that. But there are times when you don't need that or want something different. I like Sinatra a lot too.

Roger Pack said...

I would says rails forever if it weren't so very slow...

Tony said...

@Ege with multiprocess systems you run afoul of one of two problems:

1) Serialization is expensive
2) Multiprocess shared memory is hard, ugly, broken

Regarding #1, if you can avoid crossing process barriers and stay within a shared heap, or manage your own memory in the same process, you have the fastest solution available.

Regarding #2, the semantics of SysV shared memory systems are complicated and ugly. I know there are a lot of fans of e.g. python multiprocessing but that's significantly more complex than using threads + Jython (or JRuby)

Peter Moberg said...

Great article! Totally agree that choosing Node.js for performance reasons is wrong.

My biggest problem with Node is callback driven way you have to write your applications. It really obfuscates the code and makes it unnecessary complicated to read and understand the code.

I blogged about it a while back:
Callbacks are polluting your code

Gui said...

Pretty much all the bad things you said about Node.js are wrong.

- JSON serialization is a reason Rails kicks ass. Really? Pretty much any framework can do all the things you mentioned in countering the first "argument" against Node (and that argument has nothing to do with the comparison between Node.js and Rails, btw).

- Performance of node.js is better? Yeah, all your arguments actually confirm that. That is so contradictory if we remember the title of this amazing post.

- "All that said, let me start with Node's fundamental problem: callback-driven I/O." That just has nothing to do with reality. Node.js actually allows you not to worry about the stupid concept of flow control. And even if you want to do that foolishness, Node.js will allow you to do that PRETTY EASILY.

You should be ashamed of yourself and delete that post. For some unbelievable reason your post ranks well at Google.... and it's because of clueless people like you that more and more people give Rails up and choose Node.js.

Guess you just have no clue about programming fundamentals. Here's one: REST is analogous to a messaging system. So just worry about the GODDAM MESSAGES... oh, and btw, messages are async.



Pastor Joe said...

Meh. Those who think JavaScript is a terrible language typically don't understand the difference between OOP and functional programming. They are two entirely different paths.

The author would do well to stick to his very well stated arguments in favor of Rails. Overtly generalized mud slinging does nothing to move the web forward. It only portrays the author as ignorant when he is not; and divides the larger web community which thrives on openness and collaboration.

In short fanboys kill innovation.

a ji o ji suno ji said...

LOL,they are cute my friend! One Pugs is one of my daughter's FAVorite breeds (the rest of us here likes em too ;) ). Moonlight (the Grand Daughter of Spot the Wonderdog,an Austrian Shepard we still love and miss) is the same way when one of us gets in (especially one of the kids from school).
german Shepherd Puppies For Sale

Mark Selby said...

Creating API's in Rails is a terrible idea unless performance, thus user experience and server costs don't matter.

ActiveRecord is the biggest problem. In many use cases using find_by_sql will yield a 10x speedup - and there's no reason not too for API's - this endpoint serves this data should always be achievable in one query.

A lot of the movement to NoSQL databases is the result of non SQL knowing people using ORM's to access databases inefficiently. Trouble is, you really begin to lose the structure to the data, and then the client code becomes more and more *if's* and *then's*.

Writing API's in Node is really easy - get this data and send it out.

"Callback hell" is just developers who are trying to apply a procedural mindset to evented code. In practice, it makes the code much more modular and maintainable. As soon as you hit your "callback hell" you realise that you're doing it wrong and that a module is needed.

Do this, let me know when you're done. Nothing hard about that.

The trick to all of it is to get your data right. Get the data right and much of the server code isn't needed. Unfortunately Rails works against that mindset, even encouraging people *not* to learn things like efficient querying over "we must do everything in Ruby and add more and more servers."

Been there, seen it, done it.

If you aren't comfortable with Javascript, what sort of web developer are you anyway?

Caner Çakmak said...

In the coming versions of Rails they should provide us easier and faster ways to interact with js code by using ruby. We all use coffescript and love it but it's still js. That may be something like Opal. Something that enables use to play with client side using just ruby.

Tech is advancing and I feel myself *uckin inclined to use JS alternatives. But I don't want it! I don'T want to use more and more JS! It's 2014 and yes we know many are switching to Node for several reasons. It's obvious that Rails is losing it's popularity and JS may takeover however the magic wasn't Rails and won't be.

The magic is Ruby. I've never heard a person not liking ruby but heard many hating js!

So If we are forced to use js we have to sort it out. And we have to sort it out efficiently and effectively. To that day I'll continue writing Ruby.

Thanks for this great post