Tuesday, October 4, 2011

Node.js has jumped the shark

I've been trying to understand this whole Node.js thing that's been going on for the past few days. It's taught me a lot about how the Node.js community tries to reason about problems, and perhaps what's the real problem with Node.

It began with a relatively short rant by Ryan Dahl. Ryan talk's about how he's mad at complex software, which is at least an arguable position. I liked his rant, but Ted Dziuba did not.

Ted Dziuba wrote a brilliant troll post which lambasted Ryan and the concept of Node.js in general. I'm a huge fan of Ted's previous trolling, especially his parody TechCrunch site uncov, which trolled TechCrunch before TechCrunch became a parody of itself. Except in the case of uncov, it would be a few years before TechCrunch became a parody itself. With Node.js, the reaction was almost instantaneous.

Ted's argument centered around the problem of doing non-trivial computation within the event loop of a system like Node, and how that would affect the performance of the system. To illustrate this point, he gave an intentionally deoptimized Fibonacci function, along with the boldface caveat "Yes, I know there's a closed form solution. Shouldn't you be in front of a mirror somewhere, figuring out how to introduce yourself to her?"

Reading through the lines of Ted's trolling, his point is "I'm not specifically talking about Fibonacci. I'm talking about non-trivial computation as a general problem in Node." He specifically chose a slow algorithm to illustrate his point. However, what was the Node community's reaction?

One reaction was to go out and implement Ted's intentionally-retarded Fibonacci algorithm in a bunch of different languages and benchmark how fast the VM could optimize away a crappy algorithm. Based on several benchmarks, V8 actually does a comparatively good way of optimizing away the retarded algorithm compared to a lot of other languages. Cool story bro!

In the Reddit comments for this story I first truly gazed into the abyss of Node.js insanity. Node provides a way to schedule an anonymous function to run the next time it's event loop iterates using a function called nextTick. EventMachine (ugh) users may be familiar with this same pattern with EM.next_tick (which needs to die for most use cases, but that's the subject of another blog post)

Redditor "headzoo" suggests that we provide a "non-blocking solution" by unrolling the Fibonacci loop and execute one iteration of the Node.js event loop, calculating one iteration for each I/O multiplexing system call Node is making. When Ted Dziuba asked "Have you taken epoll into your heart?," I think he had no idea of the depths Node people actually have.

I don't really get what's going on in Node people's heads here. I think they seriously believe that throwing computation into the middle of an event loop is a good idea, because Node's event loop is FAST!!#$#! funroll-loops + nextTick = LUDICROUS SPEED!!

This crap is absolutely insane. For each iteration of the loop, Node is making an expensive system call. It's also defeating all of the mechanisms that V8's impressive JIT would use to optimize this problem, as substantiated by the previously mentioned Nodian's roflbenchmark. Clearly this is a very bad idea. But wait, it gets worse!

It wouldn't take long before another Node.js fan put together a Github project for this technique, which also added an "optimization": memoization! In addition to unrolling each iteration of the Fibonacci function to run within the event loop, it now caches the nth result. Now let's throw ApacheBench at it and see how fast Node can serve the precomputed result where n = 100.

Holy balls, it does 5390.22 requests per second serving a memoized value when I ask ApacheBench to request the same memoized value over and over!!#$! TAKE THAT TED DZIUBA! Is it any good? One word: yes. Did we miss the point that this isn't about Fibonacci? One word: yes. (Side note: I second raganwald in adding "Is it any good? Yes." to the top of all my READMEs)

Memoizing each member of the Fibonacci sequence makes this algorithm O(n) in memory. For shits and grins I installed Node-fib and asked it for the 1 millionth Fibonacci number. Node ground away (on a single core of my quad core i7 MacBook) for about a minute before running out of memory, crashing the entire VM.

Okay, so mistakenly implementing an algorithm this pathologically bad when you're trying to prove a point to someone who's just trolling you is pretty bad, right? But it gets worse. 

After pointing this out, a member of the Node.js community (post now deleted) suggested I might have an obsolete version of Node with a 1GB heap limit (because Node uses a VM targeted at client-side JavaScript where 1GB heaps are an uncommon use case, but hey, let's throw it on the server!) and that I recompile without the 1GB restriction so that this retarded algorithm can continue eating up all my system RAM. He says this as if this is a good idea instead of, I dunno, using a better algorithm.

Clearly the next logical steps are to store previously computed Fibonacci numbers in MongoDB. Or you could build a system with Socket.IO which caches previously computed Fibonacci numbers client-side using a distributed hash table. Then you'd be well on your way to a multimillion dollar cloud-based Fibonacci number service. Eventually you'll probably want to build in an OpenMP C++ native extension to calculate Fibonacci numbers across multiple CPU cores. That way you'll have a truly roflscale Fibonacci web service.