Tuesday, April 17, 2012

Introducing DCell: actor-based distributed objects for Ruby

DCell, which is short for Distributed Celluloid (and pronounced like the batteries you used to jam into boom boxes and RC cars) is an actor-based distributed object oriented programming framework for Ruby. Celluloid is an actor library I wrote for Ruby which exposes concurrent Ruby objects that "quack" just like any other Ruby object. DCell takes the asynchronous messaging protocol from Celluloid and exposes it to distributed networks of Ruby interpreters by marshaling Celluloid's messages as strings and sending them to other nodes over 0MQ.

Before I talk about DCell I'd like to talk a little bit about the history behind distributed objects in general and the ideas that DCell draws upon.

A Brief History of Distributed Objects

Nowadays you don't hear people talking much about distributed objects. However, once upon a time, distributed objects were big business. They used to be one of Steve Jobs' passions during his time at NeXT. In the mid-90's, Steve Jobs phased out NeXT's hardware division and began repositioning the company as, among other things, the "largest object supplier in the world." NeXT turned its focus exclusively to its software, namely the WebObjects framework for building dynamic web sites. It would also ship the Enterprise Objects Framework, which Steve described as allowing you to "make NeXTSTEP objects, and with no programming, have persistent and coherent storage with SQL databases" (sound a little bit like ActiveRecord a decade before ActiveRecord, anyone?)

WebObjects was built on a technology called Portable Distributed Objects, which allowed Objective C applications developed on any platform to seamlessly interoperate over computer networks and across platforms. If you haven't seen it, watch Steve Jobs 1995 presentation The Future of Objects (although you may want to skip directly to where Steve begins talking about distributed objects). Even now, some 16 years later, these demos still seem futuristic. Steve loved the simplicity of distributed objects: "Objects can message objects transparently that live on other machines over the network, and you don't have to worry about the networking gunk, and you don't have to worry about finding them, and you don't have to worry about anything. It's just as if you messaged an object that's right next door."

So why is it some of Steve's demos seem futuristic and slightly beyond our grasp even a decade and a half later, and building multi-tier client/server (i.e. web) applications is a challenge typically involving building lots of client and server wrapper code instead of transparently invoking distributed objects? Unfortunately, Steve's rosy future of distributed objects never really came to pass. NeXT's technology was commercial, only available in Objective C (at a time when C++ reigned supreme and no one used Objective C), and couldn't beat the open web and the standards that would emerge for developing web APIs. Meanwhile, the industry standards for distributed objects would be mired in debacles of their own.

In the before time, in the long long ago, in the days before JSON, XML, and REST, when Tim Berners-Lee had just started hosting the world's first web site on his personal NeXTstation, C++ was the lingua franca of the day and people were just getting started making C++ programs communicate over networks. The Object Management Group (with the now-unfortunate acronym OMG) hashed out the necessary wire protocols, object representations, interface definitions, and a profusion of other standards needed to allow C++ programs to invoke remote objects over TCP/IP networks. The result was CORBA.

CORBA is a technology that drowned in its own complexity, however in the early '90s it was the Enterprise: Serious Business way to allow applications to communicate over networks. Once you waded through the myriad complexity of the various CORBA technologies, you were left with objects you could invoke over the network in a manner somewhat reminiscent of the way you would invoke an object locally.

However, soon the web would take off and HTTP would soon become the de facto protocol programs used to communicate. Unfortunately, HTTP is a terrible protocol for implementing distributed objects. Where HTTP exposes a predefined set of verbs you can use to manipulate resources which have a number of possible representations, objects don't have representations: they are just objects, and you talk to them with messages. This didn't stop people from trying to build a distributed object protocol on top of HTTP though. The result was SOAP, a protocol which abandoned CORBA's "orbs" for web servers, its Interface Definition Language for WSDL, its Common Data Representation for XML, and its Inter-Orb Object Protocol for HTTP.

This was something of a step in the right direction: SOAP actually was a "Simple" Object Access Protocol when compared to CORBA, and SOAP would soon vanquish CORBA for Enterprise: Serious Business applications. While SOAP would gain a few fans, particularly in the Java and .NET communities, who saw the value of being able to expose objects to the network with a few point-and-clicks which generated gobs of XML, SOAP would soon join CORBA's ranks as a reviled technology.

SOAP's complexity comes first and foremost being a committee standard that would succumb to "too many cooks" syndrome in the same way CORBA did. It also suffered from trying to be a cross-language protocol that needed to deal with static type systems, requiring tools which could read WSDL definitions and spit out volumes of generated boilerplate code for interacting with remote services. Beyond that, SOAP suffered from an impedance mismatch with HTTP by largely ignoring the features HTTP provides, using it as little more than a transport wrapper for shoving blobs of XML across the network. The XML contained the actual messages to be sent to remote objects or the responses coming from a remote method invocation, while anything done at the HTTP level itself was just boilerplate.

REST to the rescue?

For all its complexity, it was easy to lose sight of what SOAP was actually trying to do. Rather than painlessly interacting with remote objects, SOAP left us wondering why things were so slow, staring at WSDL errors wondering what went wrong, and picking through gobs and gobs of XML trying to debug problems. REST, which eschewed distributed objects and favored the paradigms of HTTP, would be SOAP's coup de grâce. SOAP is now relegated to a handfull of legacy enterprise applications whereas the rest of the open web has almost universally embraced REST.

So REST triumphed and web APIs reign supreme. Distributed objects are little more than a footnote in history. And we're still left wondering why putting together the sorts of demos Steve Jobs was showing with Portable Distributed Objects in 1995 is so hard.

Using REST makes sense when exposing services for third parties to use. However, if you control both the client and server, and have Ruby frontends talking to Ruby services, REST begins to look like a bunch of gunk that's getting in the way:

Implementing domain objects, REST services, and REST clients becomes work duplicated in 3 places across systems using this sort of architecture. Wouldn't it be a lot simpler if the frontend web application could simply consume remote services as if they were just objects?

This sort of simplicity is the goal of distributed objects.

Distributed Objects in Ruby

Ruby has its own foray into the world of distributed objects: DRb, or Distributed Ruby. DRb exposes Ruby objects to the network, each uniquely identified by a URI. Clients can ask for a DRb object by URI and get a DRbObject back. These DRbObjects act as proxies, intercepting calls and sending them over the network, serializing them with Ruby's Marshal, where they're handled by a remote DRbServer. DRbServer uses a thread-per-connection model, allowing it to concurrently process several requests. When a DRb connection handler receives a request, it looks up the requested object, invokes a method on it, then serializes the response and sends it back to the caller.

For the most part, DRb has lingered in obscurity, save until recently with the publication of the dRuby book which has stirred up a modicum of interest. Where Steve Jobs thought PDO was "by far the easiest way to build multi-tier client/server applications because of [its] completely transparent distributed object model," Rubyists don't turn to DRb to build multi-tier applications, but instead typically rely on REST, building out APIs for what is, in the end, distributed communication between Ruby objects.

Why is it then that DRb isn't the go-to tool people use for building multi-tiered web applications in Ruby? It's easy to say that DRb failed because people are used to thinking in terms of HTTP and can better understand the semantics of the system when using HTTP, especially when it comes to areas like load balancing and caching. Separating services with HTTP also opens up the door to reimplementing those services in a different language in the future. However, even with future-proofing for a rewrite in another language out of the picture, I think most Rubyists would still choose to use REST APIs instead of DRb, and I think that's a defensible position.

While DRb does a great job of trying to make access to remote objects as transparent as possible, it has a number of flaws. DRb is inherently multithreaded but doesn't give the user any sort of tools or framework to manage concurrent access to objects. This means building DRb applications immediately exposes you to all the complexities of multithreaded programming whether you're aware of it or not, and Rubyists seem generally uncomfortable with building thread-safe programs. While DRb allows you to talk to in-process objects the same way you'd talk to out-of-process objects, but it doesn't make it natural to build a program that way.

Beyond Objects: The Power of Distributed Erlang

While CORBA and SOAP are reviled for their complexity, there's another distributed system which is beloved for its high level of abstraction: Distributed Erlang. Erlang is, if anything, a highly opinionated language whose central design goal is to build robust self-healing systems you never need to shut down. When it comes to distribution, Erlang's goal is to make it as transparent as possible. Erlang is a dynamic language which insists you express everything within a scant number of core types. This makes serializing state in Erlang so you can ship it across the wire extremely simple and fast.

However, the real strength of Erlang is the Actor Model, which can be more or less summarized as follows:
  1. Actors communicate by sending messages
  2. Every actor has a mailbox with a unique address. If you have an actor's address, you can send it messages
  3. Actors can create other actors, and when they do they know their address so they can send the newly-created actors messages
  4. Actors can pass addresses (or handles) to themselves or other actors in messages
Erlang uses this method within individual VMs as the basis of its concurrency model. Erlang actors (a.k.a. processes) all run concurrently and communicate with messages. However, Erlang also supports distribution using the exact same primitives it uses for concurrency. It doesn't matter which type of actor you're talking to in Erlang, they "quack" the same, and thus Erlang has you model your problem in a way that provides both concurrency and distribution using the same abstraction.

Distributed Erlang offers several features aimed at building robust distributed systems. The underlying messaging protocol is asynchronous, allowing many more messaging patterns than traditional RPC systems (e.g. HTTP) which use a request/response pattern that keeps the client and server in lockstep. Some examples of these patterns are round robin (distributing messages across N actors), scatter/gather (distributing computation across N actors and gathering the results), and publish/subscribe (allowing actors to register interest in a topic, then informing them of events related to that topic).

In addition, Erlang processes can link to each other and receive events whenever a remote actor exits (i.e. if it crashes). This allows you to build robust systems that can detect errors and take action accordingly. Erlang emphasizes a "fail early" philosophy where actors are encouraged not to try to handle errors but instead crash and restart in a clean state. Linking allows groups of interdependent actors to be taken down en masse, with all of them restarting in a clean state afterward. Exit events can also be handled, which is useful in distributed system for things like leader election.

DCell provides all of these features. When you create an actor with Celluloid, a proxy object to the actor is returned. This proxy lets you use the method protocol to communicate with an actor using messages. DCell implements special marshalling behavior for these proxy objects, allowing you to pass them around between nodes in a DCell system and invoke methods on remote actors in the exact same way you would with local actors.

Unlike DRb, DCell also exposes asynchronous behaviors, such as executing method calls in the background, and also using futures to schedule method invocation in advance then waiting for the result later. DCell also lets distributed actors to link to each other and be informed when a remote actor terminates.

I'm certainly not the first to imitate Erlang's approach to distribution. It's been seen in many other (distributed) actor frameworks, including the Akka framework in Scala and the Jobim framework in Clojure.

Bringing Erlang's ideas over to Ruby

I have a long history of projects that try to cross-pollenate Ruby and Erlang. My first attempt was Revactor, my previous attempt at an actor library which provided a very raw and low-level API which is almost identical to the Rubinius Actor API. Revactor modeled each actor as a Fiber and thus provided no true concurrency. Another of my projects, Reia, tried to bring a more friendly syntax and OO semantics to Erlang.

With Celluloid I've come full circle, trying to implement Erlang's ideas on Ruby again. Only this time, Celluloid makes working with actors easy and intuitive by embracing the uniform access principle and allowing you to build concurrent systems that you can talk to just like any other Ruby object. Celluloid also provides asynchronous calls (what Erlang would call a "cast") where a method is invoked on the receiver but the caller doesn't wait for a response. In addition to that Celluloid provides futures, which allow you to kick off a method on a remote actor and obtain the value returned from the call at some point in the future.

In addition Celluloid embraces many of Erlang's ideas about fault tolerance, including a "crash early" philosophy. Celluloid lets you link groups of interdependent actors together so if any one fails you can crash an entire group. Supervisors and supervision trees automatically restart actors in a clean state whenever they crash.

Celluloid does all of this using an asynchronous message protocol. Actors communicate with other actors by sending asynchronous messages. A message might say an actor has crashed, or another actor is requesting a method should be invoked, or that a method invocation is complete and the response is a given value. All of the heavy lifting for building robust, fault-tolerant systems is baked into Celluloid.

When programs are factored this way, adding distribution is easy. DCell takes the existing primitives Celluloid has built up for building concurrent programs and exposes them onto the network. DCell itself acts as little more than a message router, and the majority of the work in adding fault tolerance is still handled by Celluloid.

Getting Started with DCell

To understand how DCell works we need to look at how a cluster is organized. This is an example of a cluster with 5 nodes:



In this picture the green nodes represent individual Ruby VMs. The links between the nodes are shown in black or gray to illustrate actively connected or potentially connected nodes. DCell makes connections between nodes lazily as actors request them. This means DCell clusters are potentially fully connected networks where each of the nodes is (or can be) directly connected to every other node in the cluster. DCell doesn't implement any sort of routing system or overlay network, and instead depends on all nodes being directly accessible to each other over TCP.

DCell uses 0MQ to manage transporting data over the network. 0MQ supports several different messaging patterns, and in the future DCell may use more of them, but for the time being DCell uses PUSH/PULL sockets exclusively. This works well because Celluloid's messaging system is asynchronous by design: each node has a PULL socket that represents the node's mailbox, and the other nodes on the network have a PUSH socket to send that node messages.

To configure an individual DCell node, we need to give it a node ID and a 0MQ address to bind to. Node IDs look like domain names, and 0MQ addresses look like URLs that start with tcp:


To create a cluster, we need to start another Ruby VM and connect it to the first VM. Once you have a cluster of multiple nodes, you can bootstrap additional nodes into the cluster by pointing them at any node, and all nodes will gossip about the newly added node:

Once you are connected to another node, you can browse the available nodes using DCell::Node.all:


To invoke a node on a particular service, obtain a handle to its node object, then look up an individual actor by the name it's registered under. By default, all nodes run a basic information service which you can use to experiment with DCell:

To implement your own DCell service, all you have to do is create a Celluloid actor and register it on your node. See the info service source code for an example.

DCell Explorer

DCell also includes a simple web UI for visualizing the state of a particular DCell cluster. To launch the web UI, run:


Then go to http://localhost:8000/ (provided you used the same host/port). You should see the following:


This is the basic dashboard that DCell's web UI provides. You can see connected nodes, their connection state, and if they're available browse the info service and see various information about them.

We're just getting started...

DCell is only about half a year old, and still relatively immature, but already it's seen a great number of contributors and a lot of attention. Some of the most exciting developments are still on the horizon, including paxos-powered multicall support which will let you call a quorum of nodes, along with generalized group membership support with leadership election, and generalized pub/sub.

All that said, I'd like to think that DCell is the most powerful distributed systems framework available for Ruby today, and I would love to see the remaining bugs ironed out and missing features added.

That can only happen if people are using DCell, finding bugs, and reporting missing features. This may sound a little bit scary, but if you're considering building a nontrivial distributed system in Ruby, DCell is a great place to start.

Monday, March 19, 2012

Don't use bcrypt

(Edit: Some numbers for you people who like numbers)

If you're already using bcrypt, relax, you're fine, probably. However, if you're looking for a key derivation function (or in bcrypt's case, password encryption function) for a new project, bcrypt is probably not the best one you can pick. In fact, there are two algorithms which are each better in a different way than bcrypt, and also widely available across many platforms.

I write this post because I've noticed a sort of "JUST USE BCRYPT" cargo cult (thanks Coda Hale!) This is absolutely the wrong attitude to have about cryptography. Even though people who know much more about cryptography than I do have done an amazing job packaging these ciphers into easy-to-use libraries, use of cryptography is not something you undertake lightly. Please know what you're doing when you're using it, or else it isn't going to help you.

The first cipher I'd suggest you consider besides bcrypt is PBKDF2. It's ubiquitous and time-tested with an academic pedigree from RSA Labs, you know, the guys who invented much of the cryptographic ecosystem we use today. Like bcrypt, PBKDF2 has an adjustable work factor. Unlike bcrypt, PBKDF2 has been the subject of intense research and still remains the best conservative choice.

There has been considerably less research into the soundness of bcrypt as a key derivation function as compared to PBKDF2, and simply for that reason alone bcrypt is much more of an unknown as to what future attacks may be discovered against it. bcrypt has a higher theoretical-safety-to-compute-time factor than PBKDF2, but that won't help you if an attack is discovered which mitigates bcrypt's computational complexity. Such attacks have been found in the past against ciphers like 3DES. Where 3DES uses a 168-bit key, various attacks have reduced that key size's effectiveness to 80-bits.

PBKDF2 is used by WPA, popular password safes like 1Password and LastPass, and full-disk encryption tools like TrueCrypt and FileVault. While I often poke fun at Lamer News as a Sinatra antipattern, I have to applaud antirez on his choice of PBKDF2 when he got bombarded with a "just use bcrypt!" attack (although bro, antirez, there's a PBKDF2 gem you can use, you don't have to vendor it)

The second cipher to consider is scrypt. Not only does scrypt give you more theoretical safety than bcrypt per unit compute time, but it also allows you to configure the amount of space in memory needed to compute the result. Where algorithms like PBKDF2 and bcrypt work in-place in memory, scrypt is a "memory-hard" algorithm, and thus makes a brute-force attacker pay penalties both in CPU and in memory. While scrypt's cryptographic soundness, like bcrypt's, is poorly researched, from a pure algorithmic perspective it's superior on all fronts.

The next time you need to pick a key derivation function, please, don't use bcrypt.

Thursday, March 8, 2012

Announcing Lightrail: Lightweight Rails stack for HTML5/JS applications

There's been a lot of debate lately surrounding Rails suitability for the server stack underlying modern HTML5/JS applications. Having used Rails for some four years for this purpose, and worked with a number of Rails core members, this is a problem I think the Ruby community has solved wonderfully, but yet some are confused as to what solutions are available or the way forward.

Rails 3 provided enormous advances in terms of letting you specialize what Rails provides to the problem at hand. However, to a certain extent this goes against the Rails mantra of "convention over configuration". While Rails 3 provides ample opportunities for configuration, as Rubyists, we shouldn't have to configure anything, right?

I'm a huge believer in both Rails' suitability as a backend for modern client-heavy HTML5/JS applications, and someone experienced in building such applications. Rails is, was, and continues to be a game-changer for modern web development. ActionController::Metal provides the bare minimum needed to build apps which don't need the complete set of HTML-generating abstractions provided by ActionView, but need more tools than a more minimalistic framework like Sinatra makes available. Between Sinatra and Rails lies an unaddressed middle ground, one where, in theory, you should be able to build an ActionController::Metal stack appropriate to your needs, but maybe this is too daunting a task.

For you, the JSON API builder, who wants more than Sinatra but less than Rails... I have what you desire. Introducing Lightrail:


What is Lightrail? Lightrail is Strobe's ActionController::Metal stack for HTML5 applications, originally used to provide the backend APIs for Strobecorp.com and its frontend HTML5/JS application authored with SproutCore (which has been superseded by Ember.js).

Lightrail contains everything you need to build lightweight applications on the Rails stack which serve only JSON, and furthermore, contains an innovative system for building JSON APIs around your objects. Rather than adding a fat #to_json method on your models, or using a template to construct your JSON, Lightrail's allows you to map the JSON serializations of your objects to specific wrappers that know how to serialize specific objects. Like using #to_json, this makes it easy to recursively serialize nested objects to JSON without having to use ActionView voodoo like invoking other renderers. Besides that, it still separates the concerns of what your domain objects are and how they serialize to JSON. If you've tried to build JSON APIs in Rails and found the existing mechanisms for JSON serialization lacking, please try out Lightrail::Wrapper and let me know what you think.

Lightrail is something of an experiment. I didn't write it, but it's software I believe in so much I'd like to support it and see if people are interested in it. Rather than competing with Rails, Lightrail takes the latest, greatest Rails stack and reconfigures it for lightweight applications that provide a JSON API exclusively. Lightrails builds upon all of the modularity that Rails 3 brings to the table, and simply and easily delivers a lightweight stack which is still suitable for complex applications.

Please let me know if Lightrail seems like a good idea to you and if you'd like to help support it. As I have my hands in an awful lot of other open source projects, Lightrail isn't the sort of thing I can support full time. However, if you have some time to spare and ideas to contribute, I am definitely looking for people to help maintain and improve this project.

If you're interested in using Lightrail or helping out with its development, sign up for the mailing list. Just send any message to lightrail@librelist.com to join.

Monday, March 5, 2012

Why critics of Rails have it all wrong (and Ruby's bright multicore future)

Edit: Contrary to what I said here, José Valim is not stepping down from Rails core, he is merely on sabbatical. My bad.

Lately I've been getting the feeling the Ruby community has gotten a bit emo. The enthusiasm surrounding how easy Ruby makes it to write clean, concise, well-tested web applications quickly is fading. Rails has become merely a day job for many. Whatever hype surrounded Rails at its inception has died down into people who are just getting work done.

Meanwhile, Node.js is the new hotness, and many in the Node community have sought to build Node up by bringing Ruby and Rails down. I know that once upon a time Ruby enthusiasts were doing this sort of thing to Java. However, the tables have turned, and where Ruby used to be the mudslinging hype-monkey, it's now become the whipping boy and Node.js the new provocateur.

The sad thing is many of these people are former or current Rubyists who have taken a liking to Node and build it up by spreading blatant untruths about Ruby. I won't go as far as to call them liars, but at the very least, they are extremely misinformed, ignorant of the state of the Ruby ecosystem, and pushing their own agendas.

Jeremy Ashkenas, the creator of CoffeeScript, recently trashed Rails 3 and claimed "Node.js won":


The idea that Rails 3 was a major step backward was recently reiterated by both Giles Bowkett and Matt Aimonetti. Both of them painted building ActionController::Metal applications as some sort of byzantine, impossible task which can only be accomplished by a Rails core member. Are people actually building lightweight Rails applications using the newfound modularity of Rails 3?


Jose Valim, (now former) Rails core member, published a small, simple gist illustrating how to build barebones apps on ActionController::Metal (one of the most forked gists I've ever seen) which is further documented in his book Crafting Rails Applications. In just 50 lines of code you can strip Rails down to its core, making it ideal for use in modern client-heavy HTML5 applications. The funny thing about this gist is that while the idea of a 50 line Rails app seems pretty impressive, the basis of that gist is what Rails 3 puts into your config/boot.rb, environment.rb, and application.rb, just combined into a single file. Did I just blow your mind? Sadly, all the (in my opinion completely undeserved) bad press seems to have made Jose emo as well, and he has stepped down from Rails to pursue his Elixir language.

ActionController::Metal-based applications (along with apps written in Clojure) were the basis of our backend at Strobe, where we sought to ease the pains of people building modern client-heavy HTML5/JS applications with frameworks including SproutCore/Ember, Backbone, and Spine. ActionController::Metal provided a great, fully-featured, mature, and modular platform for us to build applications on top of, and Strobe's ActionController::Metal stack for client-heavy HTML5/JS applications is available on Github. The apps we built with the Strobe ActionController::Metal stack talked only JSON and our frontend was an HTML5/JS application written with SproutCore.

Before Strobe, I worked at a company building rich HTML/JS applications for digital television. Our backend was written in Rails. Our frontends were Flash and HTML/JS applications, the latter of which were single-page client-heavy HTML/JS apps that were packaged in .zip files and installed onto digital televisions and set top boxes, a sort of weird hybrid of web technologies and installable applications. Our Rails application didn't have any views, but provided only a JSON API for the HTML/JS frontend to consume.

Rails was great for this, because it provided the sort of high level abstractions we needed in order to be productive, ensure our application was well-tested, and above all else provided the necessary foundation for clean, maintainable code. I was doing this in 2008, and even then this was already considered old hat in the Rails community. In case you're not paying attention, that's one year before Node even existed.

Modern HTML5/JS apps depend on beautiful, consistent RESTful JSON APIs. This is a great way to develop rich interactive applications, because it separates the concerns of what the backend business logic is doing from the view layer entirely. Separate teams, each specialized in their role, can develop the frontend and backend independently, the frontend people concerned with creating a great user experience, and the backend people concerned with building a great API the frontend application can consume.

Rails is great for JSON APIs.

And yet this meme persists, that somehow Rails is actually bad at JSON APIs. Those who propagate this meme insist that Rails has lost its edge, and that only Node understands the needs of these sorts of modern client-heavy web applications. Giles recently attempted to argue this position:


Giles recently blogged about this issue at length. Let's look at what he has to say about ActionController::Metal and the new level of modularity and clean design that Rails 3 brings to the table:


So Jose wrote a great book about the incredible power of Rails 3's new modular APIs... but... but... but what?

WARD CUNNINGHAM BITCHES. TWEETS > BOOKS. NODE WINS. QED.

Hurrrrrrrr? Ward Cunningham is a cool guy and his concept of a Wiki was a transformative technology for the web, but what the fuck does that have to do with Rails 3's new modular APIs or Jose's book? I think that's what people in logical debate circles call a "non-sequitur".

Perhaps there's still a cogent argument to be had here. Let's dig deeper:


Okay, so the problem is there's not a damn simple way to do websockets. OH WAIT, THERE IS:


Cramp is an awesome, easy-to-use websockets/server-sent events framework (with socket.io support) which runs on Rainbows or Thin, and Thin is a great web server. According to my benchmarks it's approximately the same speed as Node's web server:

Web Server            Throughput  Latency
----------            ----------  -------
Thin    (1.2.11)      8747 reqs/s (7.3 ms/req)
Node.js (0.6.5)       9023 reqs/s (7.1 ms/req)
Yes folks, Node isn't significantly faster than Ruby at web server performance. They're about the same.

Giles also bemoans bundler, because typing "bundle exec" represents ceremony, and using any of the myriad solutions to avoid typing "bundle exec", such as bundler binstubs or rvm gemsets, represents configuration which violates the Rails mantra of "convention over configuration", and how npm is that much easier. I'm sure we would all love to not have to add a one line .rvmrc file to each project to avoid typing "bundle exec", but uhh Giles, bro, mountain out of a molehill much?

Meanwhile, let's check out how convention over configuration is going in the JavaScript world:


But enough about Giles... what kinds of awesome, modern HTML5 applications are people using Rails to build?

I think one of the best examples of this sort of application is Travis CI. Travis is an open source distributed build system with an Ember-based frontend and a Rails backend. Travis's interface shows, in real time, the state of all builds across the entire (distributed) system, allows you to interactively explore the history, see the distributed build matrix completing jobs in realtime, and even have it stream the console output of builds in progress directly to your browser as they complete. It's an amazing, modern client-heavy HTML5/JS application, and it's built on Rails.

Who else is using Ruby/Rails for their frontend? Oh, just Twitter, LivingSocial, Groupon, Heroku, EngineYard, Github, Square, Zendesk, Shopify, Yammer, Braintree, Boundary, Stripe, Parse, Simple, and of course let's not forget 37signals. Rails is the technology underlying the frontend web stack of many huge businesses. Many of these companies have client-heavy HTML5/JS applications which consume a JSON API coming out of Rails. Many of them have APIs that are routinely cited as archetypical RESTful JSON APIs. Many of them have top notch engineering teams that choose the best tools for the job and use many languages for many different tasks. Many of them were founded "post-Node" and had the opportunity to choose Node as their frontend web technology, and while they may use Node in some capacity, their main consumer-facing sites are written with Rails.

Node is three years old now. Where are the Node.js success stories? Who's built a brand on top of Node?  Nodejitsu? Hubot? Is Node anything more than a pyramid scheme or a platform for Campfire bots? Where Rails selling points eschewed performance and instead focused on clear code, rapid development, extensive testing, and quick time-to-market, Node's selling points seem to universally revolve around its insanely fast, destroy the internet fast performance (benchmarks not provided). Meanwhile code quality is de-emphasized and large Node programs degrade into incomprehensible, byzantine structures of callbacks and flow-control libraries, instead of being written in sequential code, you know, the code you can read:

 

What about Ruby in general? What advancements in the Ruby ecosystem are worth getting excited about?

JRuby is maturing into a high-performance Ruby implementation which taps the JVM's advanced features including the HotSpot compiler, multiple pluggable garbage collectors, and parallel multithreading which makes it suitable for multicore applications. One thing I think sets JRuby apart is that it's the most mature language on the JVM which didn't start there. Other projects to implement non-JVM languages on top of the JVM, such as Rhino and Jython, have languished, while JRuby keeps going strong.

The most exciting development in JRuby is Java 7's new InvokeDynamic feature. The Java Virtual Machine was originally designed for the statically-typed Java language, but has its roots in dynamic languages, namely Smalltalk. With InvokeDynamic, the JVM has come full circle and now natively supports dynamic languages like Ruby. InvokeDynamic provides the necessary information to the JVM's HotSpot compiler to generate clean native code whenever Ruby methods are called, in addition to many other potential optimizations. So how much faster will InvokeDynamic make Ruby?


Rubinius, a clean-room Ruby virtual machine based on the Smalltalk-80 architecture, is also a very exciting prospect for the Ruby community as it matures and reaches production quality. It features an LLVM-based JIT compiler, parallel thread execution, and advanced garbage collection, also making it suitable for multicore applications. Beyond being an awesome Ruby implementation, Rubinius has evolved into a true polyglot platform and now features multiple Rubinius-specific language implementations including Fancy and Atomy.

MacRuby also eliminated the GIL from their implementation and now supports parallel thread execution along with an LLVM-based JIT compiler.

There are no less than three Ruby implementations which now support thread-level parallelism and thus multicore CPUs. This is especially relevant in a time when computing is undergoing a sort of phase transition from single-threaded sequential applications to massively multithreaded concurrent applications and distributed systems made out of these multithreaded applications.

It wasn't too long ago that having even four CPU cores in your home computer seemed like a lot, and now 16-core commodity AMD CPUs are available. The future is multicore, and if your programming language doesn't have a multicore strategy, its usefulness is vanishing. Following Moore's Law, the number of cores in a CPU is set to explode exponentially. Is your programming language prepared?

Thanks to JRuby and Rubinius, Ruby can take advantage of multicore CPUs. This still leaves the small matter that multithreaded programming is, uhh, hard. Fortunately I have some ideas about that.

Celluloid is an actor-based concurrent object system that tries to pick up on the concurrent object research that was hot in the mid-90's but died shortly after the web gained popularity. In the '90s concurrent objects were ahead of their time, but with the advent of massively multicore CPUs I believe it's an area of computer science research that's worth reviving.

Celluloid packages up Ruby's core concurrency features into a simple, easy-to-use package that doesn't require any modifications to the language. Where many functional languages solve the issues surrounding concurrency with immutable state, Celluloid solves it with encapsulation (more information is available on the Celluloid github page).

Celluloid takes advantage of many of the features of Ruby, including parallel threads, fibers (coroutines), method_missing (proxy objects), and duck typing. There aren't many other languages with this particular mix of features. Python probably comes the closest, aside from multicore execution due to its GIL. Jython supports parallel thread execution thanks to the JVM but seems abandoned. For what it's worth, Python once had a concurrent object system quite similar to Celluloid back in the '90s called ATOM, unfortunately the source code has been lost.

Ruby is by far the best language available today to implement a system like Celluloid, and that alone makes me excited to be a Rubyist. Where Node.js gives you a hammer, the single-threaded event loop, Celluloid gives you a large toolbox and provides a singular framework of interoperable components which can be used to build arbitrary hybrids of concurrent multithreaded applications, event-based nonblocking applications (that are callback-free!), and distributed systems.

Ruby is a language which can survive the massively multicore future. Whether Node will stick around remains to be seen.

Tuesday, January 31, 2012

"Can you solve this problem for me on the whiteboard?"

Jim is a great chef. He's too modest to say that about himself, but he's worked either as head chef or assistant head chef at a number of restaurants. Everywhere he's worked he's been dependent and reliable, prepared great food, worked well with the other chefs, and is generally a fun guy to have in the kitchen. Unfortunately, due to the poor economy and some bad decisions by management, Jim's restaurant is about to close, so Jim is out of work and looking for a new job.

There's a new restaurant opening, a fancy place with many well-to-do investors. In Jim's world, chefs are hard to find, so Jim assumes he's a shoo-in for the job. Jim arrives at the interview at a Mexican restaurant, which feels like a great fit for Jim because Mexican food is his specialty. Jim calls up the restaurant on the phone and chats with the manager about a chef position, and the manager likes what he hears enough to schedule a job interview for Jim.

Jim arrives at the interview and talks to the manager a bit. Things seem to be going well, Jim is in his element at a Mexican restaurant. The initial meeting goes well: Jim talks his job history, how much he cares about having a fresh house salsa, and how good his Baja sauce is. "Look up the Yelp reviews of my Baja sauce!" remarks Jim. "It's the #1 reason people came to the last restaurant I worked at." The manager smiles and nods, and informs Jim he looks great on paper, however the remainder of the interview will be conducted by all of the other chefs in the kitchen. "Awesome!" Jim thinks, "I have a rapport with other chefs. This should go smoothly."

The first chef walks in, sits down at the table, and coldly stares at Jim's resume. "Can you write down a recipe for me?" he asks Jim, "There's a whiteboard over there, can you write down your preferred recipe for crème brûlée?"

Jim is a bit dumbfounded, both by the request and being asked to demonstrate his cooking ability on a whiteboard. "I'm sorry," he says, "I don't know how to make crème brûlée. I thought this was a Mexican restaurant. Would you like to know my favorite recipe for Flan?"

"No, that won't do," the assistant chef says. "Please write down how you would prepare crème brûlée"

Jim is a bit taken aback, first because he's a specialist in Mexican food, and second because instead of being asked to cook, he's being asked to write stuff on a whiteboard. "I honestly don't know how to make crème brûlée," Jim says. "Perhaps you could let me google the recipe and I could actually try to prepare it for you, instead of just demonstrating a rote ability to memorize recipes and write them down on a whiteboard."

"No, that won't do," says the interviewer, who jots down "lack of confectionary skills" in his notes. "Can you at least attempt to write down how you would prepare crème brûlée?"

Jim feels embarrassed and lost. He's being asked to do something he would never have to do in a professional capacity, and worse, rather than actually doing it, he's being asked to describe how he would do it on a whiteboard. Perhaps this is a test of Jim's ability to think on his feet, but given the position he's being asked to interview for and the question he's been presented, it's certainly an unfair one. Jim picks up the black marker and thinks hard about what the possible ingredients of crème brûlée would be.

"Well," says Jim, "I'll need cream." Jim pulls the cap off the marker and attempts to write "1. Cream", however the marker is dry and the whiteboard is on wheels that roll back when Jim attempts to write. Jim only succeeds in making a long, barely perceptible mark on the whiteboard. Having made a messy mark on the whitebard, Jim looks for an eraser but there isn't one.

"Yes," says the interviewer sarcastically, rolling his eyes, "obviously you need cream for crème brûlée. Try a different marker." Jim picks up the red marker and tries to write with that to the same result, it's dried out and won't work. Frustrated, Jim puts it down and tries the green marker, which works fine, however the board swivels vertically as he tries to write. Jim grabs the board in the upper right corner and finally manages to jot down "1. Cream"

"Okay, we have the most obvious ingredient down," says the assistant chef. "Can you think of any other ingredients that would go into crème brûlée?"

"Sugar," says Jim. The assistant chef nods, and Jim writes down sugar. "What else?"

"Milk," says Jim, and he begins to write it down before he comes to the realization that the cream and milk are redundant. Jim doesn't often cook with cream. The interviewer shakes his head in exasperation and pinches the bridge of his nose as Jim looks dumbfounded. "It's not milk brûlée," he says. Unfortunately, there's no eraser, so Jim tries to erase "3. Milk" with his hand, smearing green ink all over the board and his hand before asking "do you have an eraser?" The interviewer looks around unenthusiastically before shrugging no. Jim continues smearing the marker's ink across the surface of the board with his fingertips in a desperate attempt to compensate for the absence of an eraser.

"Can you think of any other ingredients that might go in crème brûlée?" asks the interviewer, clearly bored.

"Eggs?" asks Jim. The interviewer nods. Jim writes down "eggs". "What else?" the interviewer asks. Jim stares at what he's written down: cream, sugar, eggs. "Well," says Jim, "I assume some kind of flavoring. Chocolate perhaps?"

"Wrong," says the interviewer. "Please write vanilla." Jim looks confused for a second and jots down vanilla as asked. The interviewer jots down "trouble with basic recipes" before asking "What other ingredients can you think of?"

Jim stares at the ingredients so far: cream, sugar, eggs, vanilla. "Perhaps some water?" Jim guesses. The interviewer nods, and Jim writes down water. "Now, what are you missing?" asks the interviewer.

Jim stares at the list: cream, sugar, eggs, vanilla, water. Those seem like they should be the basic ingredients, and the interviewer rejected additional flavoring that wasn't vanilla. Jim is stupefied... he can't think of anything else. Taking a stab in the dark, Jim suggests "Salt?"

The assistant chef does a facepalm and sighs, before looking up at Jim and stating the obvious solution: "the units. Your recipe is lacking units." The ambiguity of the interviewer's question has caught Jim off guard, especially when he professed no idea of what the recipe was to being with, and worse, he has absolutely no idea what the units should be. He stares at the whiteboard for awhile before asking "how much crème brûlée are we making?"

"That's up to you," says the interviewer, "how many servings would you like to prepare?"

Jim has absolutely no clue. He's not a confectioner, but he doesn't want to completely bomb the interview, so he ventures a guess. "I'd like to prepare 2 servings. Let's try a cup of cream, a teaspoon of vanilla, two tablespoons of sugar, 4 eggs, and a cup of water."

"Those aren't the right proportions," say the interviewer. "You should use a quart of cream, two quarts water, a teaspoon of vanilla extract, a cup of sugar, and six eggs to produce six servings. Let's move on to the recipe. Can you write it down on the whiteboard for me?"

Now Jim is completely lost. The ingredients of a recipe he has no clue about are something he can guess at, but how is he supposed to guess the recipe itself? He takes his best shot.

"Break the eggs into a bowl and whisk them with the cream and sugar," guesses Jim.

"Wrong," says the interviewer.

"Whisk them with the cream and vanilla?" asks Jim.

"Still wrong," says the interviewer, "but you were closer the first time."

"Do you want me to keep guessing?" asks Jim. The interviewer sighs, writes down "completely incompetent", stands up, and says "Thank you for your time. I'll go get the next person."

Jim stands by the whiteboard and feels confused and out of place. He wonders what crème brûlée has to do with preparing Mexican food. He sits down at the table and googles for crème brûlée on his phone, quickly scanning over the recipe and thinking "that doesn't look too hard at all, I could probably make a great crème brûlée if I had a little practice." The recipe for crème brûlée is in fact quite similar to Flan, and Jim can make great Flan, but unfortunately, the interviewer won't even know as he hasn't asked Jim to cook anything. The next interviewer comes into the room.

He sits down at the table and scans over Jim's résumé, making a few grunts after scrutinizing various items. "You didn't go to culinary school?"

"No," says Jim, "but I've loved cooking since I was a little kid. I used to cook dinner with my mom every night. I've been working professionally as a chef all my life, and I can prepare great food. Why don't you just take me to the kitchen and let me show you?"

"That won't be necessary," says the interviewer. "Now, can you please write on the whiteboard how you would prepare a cheese danish?" Unfortunately, Jim is not a pastry chef either.

.   .   .

The manager has returned to conclude the interview. "Well Jim," he says, "we've discussed the issue, and we don't think you'd be a good fit here."

At this point Jim is entirely expecting this response. Jim is most comfortable in a kitchen, preparing food hands on. He feels out of place trying to explain the theoretical act of preparing food with a whiteboard. Jim loves food so much that whenever he went out for a smoke break with his fellow chefs, he continued to talk about food even when they were on break. Unfortunately, during the interview he didn't get the opportunity to discuss food in this sort of context. Instead he was asked only pointed questions about food items he didn't know how to prepare.

"I see," says Jim. "Can I ask you one question before I go?"

"Okay," says the manager.

"Throughout this interview," Jim asked, "I was asked about preparing confections and pastries, but not once was I asked about preparing Mexican food. I thought this was a Mexican restaurant. Do you serve confections and pastries here?"

"No, we don't prepare confections or pastries," said the manager, "however we're all classically-trained pastry chefs. Some of the people you talked to are actually pretty new to Mexican food. But they've all gone through culinary school and have impeccable cooking skills because of it."

"Have you considered asking your candidates to actually cook instead of explain how they would theoretically prepare something on a whiteboard?"

"It's a lot easier for us to just use the whiteboard," he says, "and we want candidates who are as knowledgable about the theory of cooking as the act of cooking."

Jim is extremely frustrated. It's not that he isn't knowledgable about the theory of cooking, but he hasn't memorized the recipe to every foodstuff on earth. Confectionaries and pastries are two areas that Jim knows very little about.

.   .   .


Jim arrives for another interview at another popular Mexican restaurant. On his way in he notes the health inspector's grade on the certificate displayed on the window: a 100%! Jim doesn't think he's actually seen a 100% score before. Jim walks in and the manager is actually there to greet him for his interview. Jim's actually pretty close with the manager, having seen him around at various farmers markets, concerts, and other events, and Jim wonders why he went to see those crazy pastry chefs before coming here.


"Hey Jim," the manager says, "I ate at your restaurant a few times. The food there was delicious!"


"Did you try the baja sauce?" asks Jim, "because I made that myself."


"Yes!" exclaims the manager, "the Baja sauce was so orgasmically delicious! Now I hope you don't mind, but we have a little test prepared. Come with me, please."


The manager leads Jim into their state of the art kitchen. It's hopping on a busy night, with people everywhere preparing the various menu items the restaurant has to offer. The order management system is fully automated using LCD displays which are mounted on the ceiling, tracking which items have been ordered, prepared, and served. The kitchen looks extremely clean and modern and the workflow seems highly efficient. The manager continues leading Jim around and shows him a prep area in the back of the kitchen which is unused. "You can work here," he says, "come with me and you can get your ingredients."


The manager continues leading Jim back to their refrigerator, where Jim notices an LCD display showing a realtime graph of the refrigerator's temperature, with bars for "too hot" and "too cold". Jim also notes in the visible history the temperature has remained within the guidelines the display is showing with very little alteration.


The manager pulls the latch to the door on the refrigerator and Jim feels a whoosh of cold air. Inside Jim finds a cornucopia of ingredients. Jim grasps some cilantro and inhales it, and the smell is deliciously fresh.  Jim darts about the refrigerator taking inventory, and discovers all the requisite ingredients are in place to concoct his own trademark Baja sauce.


"I know you can make awesome food," says the manager, "but you need to convince the owners you're a good chef. You have an hour," says the manager, "Your goal is to make delicious Mexican food."


.   .   .


58 minutes later the two co-owners of the restaurant have arrived along with the manager and have come to the back of the kitchen where Jim has been spending his time. He introduces himself and shows them the food he's prepared.


Jim has prepared some Baja fish tacos made of battered and fried red snapper, topped with Jim's own Baja sauce freshly made on-the-spot using only ingredients from the restaurant's well-stocked refrigerator. "I'm sorry it took so long," Jim says, "but really I spent 40 minutes making the sauce, and 10 minutes actually making the tacos"


The owners and the manager each grab one of Jim's tacos and bite in. They're unbelievably delicious, and it's all thanks to Jim's baja sauce. In his moment of triumph, Jim thought back to the first restaurant where he interviewed, and wondered why they were so caught up on their pastry-making ways. Clearly it takes a different kind of chef to make fish tacos than to make pastries, and perhaps Jim wasn't cut out for being a pastry chef. But when it came to making Mexican food, Jim was in his element. It seemed really weird that former pastry chefs-turned-owners of a Mexican restaurant would expect him to be a competent pastry chef, but perhaps that's what they're used to.

.   .   .

If you haven't already seen through the thinly-veiled allegory, I'm describing an interview process that based on my experience has become incredibly common in the Silicon Valley. I'm not going to name names, first and foremost because I've signed NDAs, but to those of you who have a rigorously whiteboard-driven interview process, I can't comprehend what you're doing. At the very least, if you're asking me to write on a whiteboard, make sure you have good markers and good erasers. That said...

Programmers use computers. It's what we do and where we spend our time. If you can't at least give me a text editor and the toolchain for the language(s) you're interested in me using, you're wasting both our time. While I'm not afraid of using a whiteboard to help illustrate general problems, if you're asking me to write code on a whiteboard and judging me based on that, you're taking me out of my element and I'm not giving you a representative picture of who I am or how I code.

Don't get me wrong, whiteboards can be a great interview tool. Some of the best questions I've been asked have been presented to me on a whiteboard, but a whiteboard was used to explain the concept,  the interviewer wrote down what I said and used that to help frame a solution to the problem. It was a very different situation than "write code for me on the whiteboard."

Want to do better? Give a programmer a computer. Programmers like computers. Install common editors like vim, Emacs, and TextMate and let someone choose what they're most familiar with. Better yet, give them Internet access, or even let them use their own laptop. If you're looking over their shoulder the entire time, they can't "cheat" on the interview, and maybe you'll learn something new about their workflow and how they develop software. Who knows, maybe they have a better programming workflow than you which you can only discover by watching how they work on their computer. Limiting potential programming hires to a medium like a whiteboard is a degrading experience, and one that doesn't give you an indication of a person's potential.

Last but not least, treat your potential hires like people, because they are people. Take the time to get to know your potential hires before their interview. If they've developed open source software and gained some notoriety for it, that should be a major factor in your decision, more so than what you can distill from a cursory whiteboard interview. Bottom line, if you're interviewing someone for a software engineering position, and they have a Github, and you haven't spent at least 10 minutes familiarizing yourself with what's on their Github account before you even talk to them, you're doing yourself and your company a disservice.

My worst experience (still not naming names, but you know who you are) was a company specializing in Ruby who, not to beat around the bush, was the inspiration for this whole blog post. My first in-person conversation with someone technical at the company was a nonstop wall of coding questions on a whiteboard, with no preliminary discussion of what kind of people we are or what wavelength we're on. The entire interview was conducted in an interrogation of "solve my problems or I won't give you the job." This style is completely degrading to the person being interviewed. It's computer science trivia where the prize is a job. I'm sorry, but winning a computer science trivia contest isn't a good way to gauge potential employees.

Call it sour grapes if you want, but if you're the company I interviewed with and you're reading this, and remember who I am, and remember interviewing me, I think you missed out. And I think it's your fault, which is bad because I wanted to believe in your company. I hope you're not surprised if you never heard a word back from me.

My attitude is if I'm a good Ruby programmer, and you're trying to hire me when the supply for Ruby programmers is low and demand is high, that before you even talk to me you've spent at least 10 minutes Googling for my name, looking at my code, and figuring out who I am, rather than spending an hour subjecting me to a series of ad hoc programming questions in areas I may or may not specialize in. That 10 minutes of Google will tell you a lot more than asking me to come in and scribble stuff on a whiteboard.

I think this process has left me a bit more discerning about the companies I'll actually interview with. When you're trying to hire talented developers in a scarce market, please do your due diligence and don't insult somebody skilled by asking them to do a degrading whiteboard interview instead of looking at code they have freely available on the Internet or just looking over their shoulder as they code on a computer, preferably their own, at least the first time you get to know them. You may even learn something.

Saturday, December 31, 2011

Goodbye 2011!


I don't usually write a year-end retrospective blog post but I thought 2011 was a pretty interesting year with many noteworthy events we should all take notice of and keep in mind. The events were of a sort that several science fiction authors, particularly of the cyberpunk variety, have been writing about since the '80s, and now it feels like their visions are materializing.

Let's start with the Japan earthquake disaster and Fukushima. I'm something of a Japanophile, so it was very saddening to see the devastation wrought by the earthquake and the ensuing tsunami. The Fukushima nuclear disaster was very sad to see for many reasons. First because of all the people it hurt/killed, particularly all the volunteers who selflessly exposed themselves to radiation trying to keep the nuclear reaction occurring in the broken remains of the reactor's core under control. I'm also saddened by the devastating effect this will have on public opinion about nuclear power, at a time when liquid thorium nuclear reactors seem like a promising near-term solution to the world's energy problems. The Fukushima disaster reminded me of the Miyazaki-animated music video for Chage and Aska's "On Your Mark," which takes place in the shadow of an entombed nuclear reactor.

To me at least, "Wikileaks" was the story of the year. Bradley Manning managed to sneak confidential diplomatic cables out of a secured government facility by burning them onto a CD-RW previously filled with Lady Gaga. Wikileaks began disclosing them, Julian Assange rose to celebrity because of it, and Freenet failed to fulfill its goal of being a useful platform for exactly this use case. I included the photo above which I took on the way to work one day (complete with tacky Instagram filters), which really makes me remark, "why not 'Free Bradley Manning?'"

The collateral damage the Wikileaks disclosure had on the world as a whole was absolutely remarkable. I think the disclosure of the diplomatic cables is the catalyst for the Arab Spring, in which dictators fell like dominoes. As terrible as the atrocities that occurred during these struggles were, I can't help but feel that some positive progress was made in the world, even when the future seems incredibly uncertain such as in Egypt.

BitCoin certainly caught my attention. The world now has a workable p2p currency scheme using crypto-coins as in the Cryptonomicon (except sans the gold backing). It's anyone's guess as to where it will go at this point, but it's a remarkable demonstration of the power p2p systems have to solve large-scale general problems in theoretical terms. Whether BitCoin proves to be a practical real-world currency remains to be seen. Critics have equated the value of the currency to the hype surrounding it, comparing it to things like beanie babies.

Both Anonymous and LulzSec made this year extra surreal. If nothing else, 2011 will be remembered as a year that many institutions started paying a hell of a lot more attention to infosec. It's a lesson Sony will certainly remember after the entire PlayStation Network was hacked and taken down. I thoroughly enjoyed the techno-anarchist spin that LulzSec originally put on their hacks, claiming quite ostensibly that they did it "for the lulz," and was somewhat disappointed to see them turn political after that. Still, the entire experience of having them in the world certainly felt like something out of a cyberpunk novel.

Steve Jobs died. It's really hard to imagine the scope of the effect the guy had on my life. Love him or hate him, Steve Jobs created the company that made the first home computer I ever used. WindowMaker, the window manager I used for nearly a decade before switching to OS X, was inspired by NeXTstep. At the end of 2011, I have a computer, tablet, and phone all running evolutions of NeXT's OS. The software I'm writing, DCell, also follows in NeXT's footsteps. I have a half-written blog post about that I hope to get published soon.

The IPv4 address space was exhausted. It seems like IPv6 will actually become a reality soon, forced by the increasing costs of obtaining IPv4 addresses when they're scarce due to the technology's limitations.

I saw an entire string of announcements in cancer and AIDS research this year that sounded extremely promising. The announcements spanned everything from using HIV to cure cancer to a vaccine for HIV comparable to the polio vaccine. The ability for modern medicine to cure disease never ceases to astound me and I hope that we will see bona fide generalized cures for both cancer and AIDS quite soon.

One of the craziest things I saw this year was a NOVA special on immortality (available on Netflix, btw). They had footage of human organs grown in vitro, first by taking an organ from a donor (which could be an animal), using a special chemical to wash away everything but the cartilage, then using someone's own stem cells to grow a replacement organ from the cartilaginous skeleton of a particular organ. The special showed lungs that breathe and a heart that beats. The future of organ replacement will be growing you a brand-new replacement from your own cells, using the cartilage structure from a dead animal (or potentially in the future constructing the cartilage from a 3D blueprint).

The Large Hadron Collider supposedly found the Higgs boson. We'll see what future experiments say, but it seems like the standard model has been validated and the particles we expected to be there exist.

Oh yes, we got Osama. AMERICA! Fuck yea!

Occupy Wall Street emerged, almost a liberal reaction to the Tea Party, a movement whose anger wasn't centered directly on the government, but on the financial institutions who made out like bandits with TARP money. No major financial reform has taken place, and the institutions which were too big to fail are still too big to fail, and getting back to the same tricks which tanked our economy in the first place. I'm not sure when the US government is going to eventually make the decision to try to better regulate Wall Street, but I can't see it happening before the 2012 election.

Speaking of 2012, happy 2012 everyone! The Mayans and Terrence McKenna believed that the world is going to end on December 21st. Let's hope humanity has many, many more years ahead of it :)

As for myself, I had quite the crazy year. I went to Switzerland and got back home to find my basement flooded. I lost two jobs and got two new jobs after half a decade of job security. I created Celluloid, stopped working on Reia (guess I should update my blog header). I moved to San Francisco, where I met Jose Valim, the scumbag who started working on a language just like Reia (just kidding Jose) who threw it away and started working on a much cooler version of the same language. I also met Chris Bertels, Steve Dekorte, Yehuda Katz, Carl Lerche, Ben Reesman, Tom Dale, Colin Campbell, Majd Taby, and Slava Pestov. I stuck it to Erlang as a language and Node as a community. And I happened to have about 4 months of time spent unemployed where I practically worked on Celluloid, DCell, and surrounding projects full-time. It was pretty nice.

Happy new year everyone!

Tuesday, October 4, 2011

Node.js has jumped the shark

I've been trying to understand this whole Node.js thing that's been going on for the past few days. It's taught me a lot about how the Node.js community tries to reason about problems, and perhaps what's the real problem with Node.

It began with a relatively short rant by Ryan Dahl. Ryan talk's about how he's mad at complex software, which is at least an arguable position. I liked his rant, but Ted Dziuba did not.

Ted Dziuba wrote a brilliant troll post which lambasted Ryan and the concept of Node.js in general. I'm a huge fan of Ted's previous trolling, especially his parody TechCrunch site uncov, which trolled TechCrunch before TechCrunch became a parody of itself. Except in the case of uncov, it would be a few years before TechCrunch became a parody itself. With Node.js, the reaction was almost instantaneous.

Ted's argument centered around the problem of doing non-trivial computation within the event loop of a system like Node, and how that would affect the performance of the system. To illustrate this point, he gave an intentionally deoptimized Fibonacci function, along with the boldface caveat "Yes, I know there's a closed form solution. Shouldn't you be in front of a mirror somewhere, figuring out how to introduce yourself to her?"

Reading through the lines of Ted's trolling, his point is "I'm not specifically talking about Fibonacci. I'm talking about non-trivial computation as a general problem in Node." He specifically chose a slow algorithm to illustrate his point. However, what was the Node community's reaction?

One reaction was to go out and implement Ted's intentionally-retarded Fibonacci algorithm in a bunch of different languages and benchmark how fast the VM could optimize away a crappy algorithm. Based on several benchmarks, V8 actually does a comparatively good way of optimizing away the retarded algorithm compared to a lot of other languages. Cool story bro!

In the Reddit comments for this story I first truly gazed into the abyss of Node.js insanity. Node provides a way to schedule an anonymous function to run the next time it's event loop iterates using a function called nextTick. EventMachine (ugh) users may be familiar with this same pattern with EM.next_tick (which needs to die for most use cases, but that's the subject of another blog post)

Redditor "headzoo" suggests that we provide a "non-blocking solution" by unrolling the Fibonacci loop and execute one iteration of the Node.js event loop, calculating one iteration for each I/O multiplexing system call Node is making. When Ted Dziuba asked "Have you taken epoll into your heart?," I think he had no idea of the depths Node people actually have.

I don't really get what's going on in Node people's heads here. I think they seriously believe that throwing computation into the middle of an event loop is a good idea, because Node's event loop is FAST!!#$#! funroll-loops + nextTick = LUDICROUS SPEED!!

This crap is absolutely insane. For each iteration of the loop, Node is making an expensive system call. It's also defeating all of the mechanisms that V8's impressive JIT would use to optimize this problem, as substantiated by the previously mentioned Nodian's roflbenchmark. Clearly this is a very bad idea. But wait, it gets worse!

It wouldn't take long before another Node.js fan put together a Github project for this technique, which also added an "optimization": memoization! In addition to unrolling each iteration of the Fibonacci function to run within the event loop, it now caches the nth result. Now let's throw ApacheBench at it and see how fast Node can serve the precomputed result where n = 100.

Holy balls, it does 5390.22 requests per second serving a memoized value when I ask ApacheBench to request the same memoized value over and over!!#$! TAKE THAT TED DZIUBA! Is it any good? One word: yes. Did we miss the point that this isn't about Fibonacci? One word: yes. (Side note: I second raganwald in adding "Is it any good? Yes." to the top of all my READMEs)

Memoizing each member of the Fibonacci sequence makes this algorithm O(n) in memory. For shits and grins I installed Node-fib and asked it for the 1 millionth Fibonacci number. Node ground away (on a single core of my quad core i7 MacBook) for about a minute before running out of memory, crashing the entire VM.

Okay, so mistakenly implementing an algorithm this pathologically bad when you're trying to prove a point to someone who's just trolling you is pretty bad, right? But it gets worse. 

After pointing this out, a member of the Node.js community (post now deleted) suggested I might have an obsolete version of Node with a 1GB heap limit (because Node uses a VM targeted at client-side JavaScript where 1GB heaps are an uncommon use case, but hey, let's throw it on the server!) and that I recompile without the 1GB restriction so that this retarded algorithm can continue eating up all my system RAM. He says this as if this is a good idea instead of, I dunno, using a better algorithm.

Clearly the next logical steps are to store previously computed Fibonacci numbers in MongoDB. Or you could build a system with Socket.IO which caches previously computed Fibonacci numbers client-side using a distributed hash table. Then you'd be well on your way to a multimillion dollar cloud-based Fibonacci number service. Eventually you'll probably want to build in an OpenMP C++ native extension to calculate Fibonacci numbers across multiple CPU cores. That way you'll have a truly roflscale Fibonacci web service.