Tuesday, June 29, 2010

Reia: Pluggable Parsers

One stand-out quality of the Ruby community is a fascination with obtaining and manipulating Ruby parse trees.  Such a fascination exists in many languages, but it's particularly weird in Ruby because until Ruby 1.9 there was no first-class way to obtain a Ruby parse tree.  People went spelunking with C code into Ruby's internals, ripping the parse tree right out and exposing it back to the Ruby environment.  Eventually Ruby parsers were implemented in Ruby itself in various projects.  Yet it remains that while Ruby as a language seems to attract parse tree tinkerers, the language itself does not provide first-class ways to satisfy their needs.

I firmly believe that being able to obtain a parse tree for the programming language you're using is important and should be a first-class language feature.  To that end, Reia supports a String#parse method:

>> "2+2".parse()
=> [(:binary_op,1,:'+',(:integer,1,2),(:integer,1,2))]

This parses the "2+2" string as Reia source code.  The result might remind you a little bit of Lisp: it's a Reia parse tree.  Right now there aren't immediate uses for Reia parse trees, but I'd soon like to add an interface for compiling/executing them.  Erlang supports a feature called "parse transforms" which allow on-the-fly transformations of Erlang syntax.  I'd also like to add such a feature to Reia.

If String#parse were just used to parse Reia source code it'd be a bit of a waste.  However, it can be used for more than just that.  For example, parsing JSON (as of tonight):

>> '{"foo": [1,2,3], "bar": [4,5,6]}'.parse(:json)    
=> {"foo"=>[1,2,3],"bar"=>[4,5,6]}

After some recent problems dealing with JSON libraries in Ruby, I really felt JSON parsing should be part of the standard library.  With this syntax, it almost feels like JSON parsing is part of the core language.  Rubyists generally implement that sort of thing by monkeypatching the core types.  Reia lets anyone define their own String#parse method by defining special module names, with no modifications to the core types required (which Reia doesn't let you do anyway).

To better understand how this works, let's take a look at how Reia implements String#parse:

def parse(format)

Given a format of :foobar, String#parse will capitalize the argument into "Foobar", then look for a "FoobarParser" module to parse itself with.  This means anyone can add a parser to the language just by defining a module name that ends with "Parser" and has a parse method which accepts a string as an argument.

In short, anyone can add a parser to the language which can be called with a simple, elegant syntax.  No monkeypatching required.