Tuesday, May 11, 2010

Reia: Year 2

Yes, two years, the same as the number of heads on this turtle...

It's been two years since I committed the initial scanner to the Reia repository, which was the first commit of real code.  Last year I posted a large list of features which I had managed to crank out in the previous year.  But then I began to struggle with the codebase and implementation, and started to doubt some of the features I had been adding and suggested forking off the old implementation and starting over anew, as, alas, due to some poor design decisions Reia was unusably slow.

Rather than doing that, I decided to do a largely ground up rewrite, incorporating bits and pieces of the old code.  In the intervening time I had learned a lot about Erlang, like records are a necessary evil.  I had learned a lot more about how the Erlang compiler works and how people ordinarily write parse transforms, namely the extraordinary power of mapfold_subtrees.  After powering through the rewrite, the result was a much leaner, nimbler, and above all else comprehensible codebase with substantially better performance.

Unfortunately because of the rewrite I haven't managed to equal the same list of features the language could boast last year, but has picked up many new ones.  All your normal C-like bitwise operators now work.  C's ternary operator now works, and even supports the same wacky associativity as languages like C and Ruby (but unfortunately not the wacky left-associativity of PHP).  But even better, Reia now supports "magic rebinding" which allows you to use Reia's dicts with the same syntax as Ruby or Python, and even supports rebinding of a method's receiver using Ruby's method! syntax.

Recently I've added the basics of an immutable object system, and begun work on class-based polymorphism.  The goal is to put an OO-like face on a system similar to Erlang's parameterized modules.  Parameterized modules are an alternative way of storing structured data, created as an alternative to Erlang's records, one of the language's most shunned features.  However, it seems most people still steer clear of parameterized modules and find using them odd.  I hope Reia's immutable objects will provide the much sought after solution to this problem, by coupling state and function in the way Joe Armstrong so detests.

I would like for immutable objects to form the basis for almost everything in Reia.  The core types will be immutable objects.  Exceptions will be immutable objects.  External resources... you guessed it, immutable objects.

Does that mean processes, particularly ones like gen_servers, can also be immutable objects?  Maybe, but I'd much rather pursue making gen_server a first class part of the language.  Right now there's a considerable amount of boilerplate involved in creating a gen_server.  In the previous branch of Reia I managed to create a great proof of concept demonstrating a concurrent object system with "mutable" (yet behind the scenes still pure functional) instance variables using a syntax identical to Ruby's class definition syntax.  I still would like to add this back.

But in the meantime, I'm working on Reia again, immutable objects are on the way, and after that, the initial standard library and the first official release.

Sunday, February 21, 2010

Reia: new branch merged

If you've been following Reia, you're probably aware I started on a new branch intended to correct problems with the language's initial design and implementation. That branch has now been merged into "master" and is now the definitive implementation of Reia going forward.

Here's a short summary of the changes:
  • Significantly faster at loading code
  • All Reia core types are identical to their Erlang counterparts
  • Methods of all core types are now self-hosted in Reia (some such as List couldn't be before)
  • All code is compiled: no more metacircular evaluation
  • Records are used extensively in the compiler implementation, making the code easier to understand and change
  • Standard bitwise operators from the C family of languages are now implemented
  • Standard +=, *= etc assignment operators are now implemented
  • "Magic Rebinding": members of lists, tuples, and dicts can now be assigned directly. "Bang" operator allows modification of the methods of receivers.
  • Concurrent object model is gone (to be reintroduced in Reia 2.0)
  • Old standard library is gone (to be rewritten with immutable objects)
As part of this merge, I've gone through the Reia Wiki, bring it up-to-date with all of the changes which have gone into the new branch. I believe I've caught them all, but if you see anything documented in the Wiki which doesn't seem to work, please let me know.

These represent that majority of hurdles which needed to be overcome before a Reia 0.1 release. However, one major feature is still missing: immutable objects. Reia will implement all types, including its core types, and user defined types, as immutable objects. You can think of these like objects in any other language, except once constructed they cannot be changed. Objects with changing state will be provided by the concurrent object model of Reia 2.0.

This means Reia will be an "everything is an object" language after all. I will provide the details of immutable objects in a follow up blog post.

Saturday, January 9, 2010

Reia: now over an order of magnitude faster at loading code

Happy New Year everyone! I really hope 2010 will be a great year for Reia, and to start it off I'd like to announce some preliminary results on the new branch which I find incredibly exciting. Very recently I've imported all the compiler passes from the old branch necessary to run parts of the test suite. The new compiler and runtime are able to execute these parts of the test suite, and the performance difference between the old and new branches are night and day.

First, a peek at the old branch:

Finished in 14.461775 seconds
69 assertions, 0 failures, 0 errors


That 14.5 seconds only includes the time it takes to execute the test runner, and not the initial environment setup and loading of the standard library. If we factor all that in, the wall time of executing a relatively minimal test suite jumps up to a whopping 30 seconds of wall time.

Now, let's look at the new branch:

Finished in 0.422597 seconds
33 assertions, 0 failures, 0 errors


Yowza! I removed a number of the old tests that pertain to features which no longer exist (like the concurrent object system), but even still the difference is striking. The total wall time to execute the tests is now approximately 3.6 seconds! Note that the old branch has a relatively small standard library it loads, so comparing wall time is a bit unfair, but even still, this is a massive improvement.

Lies, damn lies, and statistics are all well and good, but to really drive the point home, I'd like to show you some graphs of the CPU usage between the two branches. First the old branch:


As you can see, the CPU usage of the previous branch when running the test suite juts out like a butte in a desert landscape. This is running on a dual core machine, and the Erlang VM is sufficiently capable of taxing both CPUs almost to their maximum just running the Reia test suite. Now let's look at the new branch:


Can you tell when the tests were running? In case you can't it's over there on the far right, the part that sticks out slightly farther than the ambient noise of all the other programs running on my computer.

I'm typically not one to care too much about performance. I often bandy about the phrase "FOR SPEED" in jest, as once upon a time I used to be a C programmer overly concerned with performance while not paying enough attention to things like development time and correctness. However, that attitude can only go so far. Sometimes you must recognize that the code you've written has some absurd performance problems due to bad design decisions, and the only solution is a ground-up rewrite.

A ground-up rewrite is not something I undertake lightly, and I recognize that in doing so I've taken a lot of the wind out of Reia's sails. However, I think this rewrite was very much justified, and in the process, I've not only improved the language's performance, but corrected a lot of the mistakes I made in the original implementation. I know Erlang much better now, have made extensive use of records which makes the codebase much easier to work with and change, and have implemented some of the features I always longed for in the original branch including the long sought after "magic rebinding" which can solve some of the usability concerns that have been expressed with Erlang.

I've also restructured all of the compiler passes to use a construct similar to Erlang's erl_syntax:mapfold_subtrees function, which I hope improves the clarity of the compiler and allows others to better understand how it functions.

In my past development of both the old and new branch, I held off on starting with TDD due to my desire for a self-hosted test suite. However, from now on I vow to use TDD whenever I add features to the language, writing tests for them first then beginning the implementation. I've already begun doing this now that the test suite is working. I will need to go back and add tests for the existing language features which I developed without TDD, but once that's done, I promise I won't add new features without writing tests for them first.

Finally, there's one further optimization that can now be performed: caching the generated bytecode when Reia sources are loaded, so that unless the original source file has changed it doesn't need to be recompiled every time it is loaded. This is similar to an optimization performed by other bytecode-based VMs for scripting languages, including Python "byte-compiling" and the compiled form output by the Rubinius VM for Ruby. By caching the compiled output, the amount of time needed to load code is further reduced, at least every time after the code is initially loaded:

Finished in 0.029055 seconds
33 assertions, 0 failures, 0 errors


After the test suite is executed once and "byte compiled", it runs yet another order of magnitude faster. That means the new branch can load code somewhere between 2-3 orders of magnitude faster after it has been byte compiled. That's a serious speed improvement.

Sunday, December 13, 2009

Reia: now with "Magic Rebinding"

In Ruby, thanks to its first class syntax for "hashes" and mutable state, it's quite easy to do:
h = {}
h[key] = value
The equivalent code in Erlang is noisier, thanks to immutable state and single assignment:
Dict1 = dict:new(),
Dict2 = dict:store(Key, Value, Dict1).
Since Reia lacks mutable state, it never before had syntax as simple as Ruby's... but now it does!

I have been trying to hold off on adding syntactic sugar like this to my new "minimalist" branch of Reia. However, this is a feature I meant to add to the old implementation, and tried to retrofit it in long after the implementation had grown quite complex, never managing to succeed. I decided to tackle it now, and I'm happy to announce that it works! Furthermore, it can be used in complex pattern matching expressions:
>> m = {}
=> {}
>> (m[:foo], m[:bar], m[:baz]) = (1,2,3)
=> (1,2,3)
>> m
=> {:bar=>2,:baz=>3,:foo=>1}
So what is going on here exactly? Reia is an immutable state language, so surely I'm not mutating the value that "m" references.

In these cases, Reia is "altering" the local variable binding. Each time you change a member of a map ("hash" for you Ruby folks, "dict" for you Erlang folks), a new version of that map is calculated, then bound to "m". Behind the scenes, the Reia compiler is translating these calls into destructive assignments.

Maps, Tuples, and even Lists now support assignments in this way (although Lists only for consistency's sake... I hate to think of people actually setting values in lists by index). Tuples and Lists even support Ruby-style negative indexing:
>> t = (1,2,3)
=> (1,2,3)
>> t[-1] = 42
=> 42
>> t
=> (1,2,42)
I plan on eventually exposing this functionality to user-defined types as well, in the form of "bang methods" on immutable objects. Users of Ruby are likely familiar with them:
>> arr = [1,2,3]
=> [1,2,3]
>> arr.reverse; arr
=> [1,2,3]
>> arr.reverse!; arr
=> [3,2,1]
Here you can see that calling the "reverse" method on an array (without the !) does not modify the original array in-place. Instead, it returns a new array in reverse order. The complimentary "reverse!" method performs an in-place modification of the array.

The "method!" idiom in Ruby is generally used to indicate methods that modify their receivers as opposed to versions which do not. However this is not a solid requirement, and "!" is often added to any methods considered "dangerous". There's no specific meaning to putting "!" on the end of a method and certainly nothing Ruby does differently at the language level.

In Reia, "bang methods" will be a first class language construct, and will always rebind the receiver with the return value of the method. This will provide a simple way to allow "in place" modifications of immutable objects, by having "bang methods" create and return a new immutable object.

It's the best of both worlds: the ease of use that comes from mutable state, with the concurrency benefits of being completely immutable.

Sunday, November 29, 2009

Reia: now fully compiled (and sometimes JITed)

One of the most frequent questions I get about Reia is its execution model. Is it a simple parse tree walker? Is it interpreted? Is it compiled?

The old branch of Reia made extensive use of the Erlang metacircular interpreter, which is a parse tree walker. Any code within models or classes, however, was compiled to Erlang bytecode. Reia did autodetect HiPE (the Erlang native code compiler/JIT) when available, and would use it when compiling modules/classes.

The new branch of Reia does not make use of the Erlang metacircular interpreter at all. Instead all code, including any code which is eval'd, is translated into Erlang then compiled by the Erlang compiler. This means Reia is 100% fully compiled, and will compile to native code when your Erlang interpreter supports it.

"AutoHiPE" is off by default for now, if only because HiPE has a slightly greater startup time than the normal BEAM interpreter.

HiPE has some additional problems as well. It has limited platform support. x86-64 is not one of the supported platforms. Given that BEAM is fundamentally a register machine you think it'd be ripe for compilation to native code via something like LLVM.

But for now, enjoy native code compilation on the platforms that support it by passing {autohipe, true} as a compiler option.

Wednesday, November 25, 2009

The new Reia: now without rainbow-farting Domo-kuns

Well, what can be said: the buzz about Reia has died down, and several people are confused about the state of the project. Is it going forward? Is it being actively developed? What's this Peridot thing?

I recently attended RubyConf and was asked by many people about the state of Reia. My answer was always "wow, I should really blog about that" to avoid repeating the same things to people over and over (after all, the Ruby community emphasizes DRY, right?). Well, here is that blog.

The State of Reia: Forget Peridot

To put it bluntly, Reia is undergoing a ground-up rewrite. This does not mean I am abandoning the previous codebase. Far from it. Instead I am rewriting the parts of it which are problematic, and pulling in code from the previous implementation where it makes sense.

When I talk to various people who are implementing languages for fun, it seems most people are interested in producing one-off experiments with little attention to developing them into "real" languages someday that might actually be used in production by a lot of people. This certainly makes sense, and that's how I started with Reia. However, buzz grew, and so did my investment in the project. Every indicator I've been given has shown me that Reia is something a lot of people are potentially interested in, so half-assing it isn't going to cut it.

The new Reia roadmap calls for reaching complete feature parity with Erlang with as minimal an implementation as possible, then making it rock solid. At this point, while Reia will lack many of the "dream features" of the previous implementation, it will be generally usable as an alternative to Erlang. Once new language features become available existing programs can be modified to make use of them. After all this is done syntactic sugar can be added, and finally, the concurrent object model.

Initially I had thought of splitting off these additional features into a separate language, which I called "Peridot", but after long and careful consideration, this doesn't make sense. The new Reia will start as an alternative syntax for Erlang (with destructive assignment) but will grow to include all of the features I had originally envisioned.

What Was Wrong with the Previous Implementation?

Why rebuild Reia from the ground up? Can't the existing implementation be salvaged and molded into something usable?

There are a number of problems with the existing implementation. Some stem from my lack of knowledge of Erlang itself when I started. Some of them stem from my lack of knowledge of best practices when writing a language compiler in Erlang. Others stem from the organic way the language evolved. But above everything else, the problems stem from one feature I deemed absolutely necessary: eval.

I like eval lots! If nothing else, it exists for one reason: so Reia can have an interactive shell (i.e. a read-eval-print loop, a.k.a. REPL). I spend lots of my time hacking on Ruby code by interacting with it from irb, the interactive Ruby interpreter. I have a very hard time working with languages that do not provide some form of interactive interpreter nowadays.

The biggest problem with implementing eval is that you have to write your own implementation for your language. In the previous version of Reia I tried to sidestep that by using erl_eval, the Erlang metacircular interpreter, as my eval implementation. Unfortunately, to facilitate this, I ended up implementing the entire code loading process in a way which shoved everything to erl_eval. The result ended up looking something like this:

the previous wonky ass Reia compiler

When code entered the system, it was first parsed and converted to Erlang by reia_compiler (the Domo-kuns). For module and class declarations, the code was compiled down to Erlang bytecode (the rainbow farts) which were in turn injected into the toplevel Erlang AST. In other words, the toplevel scope of any Reia file was never compiled, but simply stored as expressions, and where module/class declarations existed, instructions to load the compiled module (which itself was injected directly into the AST) were issued. This provides somewhat Ruby-like semantics for module/class declarations: they're "loaded" at the time they're declared.

The resulting Erlang AST, complete with compiled class/module fragments, was then shoved through the Erlang metacircular interpreter, erl_eval (seen in the diagram as the tornado). As you might guess, compared to compiled Erlang the metacircular interpreter is reaaaaaaaaally slow.

Once everything was said and done, the resulting class/modules were loaded into the Erlang code server, pictured here as a hungry Joe Armstrong going *nom* *nom* *nom*.

Making Reia Faster


As you may guess, an as-yet-unstated goal of this rewrite is to improve the speed of code-loading. Previously, Reia could not have a large standard library, because it took so long to load code. Furthermore, implementing a mechanism for caching compiled bytecode was impossible due to the API structure.

The new code-loading API directly compiles everything, including any code which is eval'd. This not only makes everything significantly faster but also facilitates the possibility of caching and also various bugs surrounding the eval implementation. From what I've gathered elsewhere, most compiled languages generally ditch any form of metacircular interpreter and implement eval by compiling temporary modules.

Doing this in Erlang is hard, because certain expressions in Erlang create things which exist beyond when code is being evaluated, namely processes and lambdas (a.k.a. funs). This was a vexing problem to me for quite some time, but after talking with Robert "Hello Robert" Virding, one of Erlang's creators, I believe I've come upon a workable solution, even if it's a bit of a hack.

Reia will implement its own "garbage collector" process for eval'd code, which periodically checks if all the lambdas/processes created by a particular eval call are no longer in use. If so, it will remove the temporary module. If not, then it will continue to wait. It is not the greatest solution in the world, but it will get the job done.

This means no Reia code will ever go through erl_eval. Everything is compiled. This will make code loading of all sorts, and eval, much faster. There are no longer any rainbow farting Domo-kuns.

What About Neotoma?

When I originally began my rewrite of Reia, I was attempting to redo the parser using Neotoma, a Parsing Expression Grammar (PEG) tool for Erlang, similar to Ruby's Treetop.

I eventually shied away. This had little to do with Neotoma itself, but my own inability to understand PEGs, and the fact that my own inability to understand them was a roadblock in continued development. Because of this, I switched back to more familiar tools: leex and yecc, the Erlang equivalents of lex and yacc.

Neotoma has come a long way and become better than ever. I am still considering using it. I think it would be a great tool for solving a lot of problems that aren't presently solved, like handling Reia expressions within interpolated strings. This is something I might pursue when I am confident that development otherwise is coming along at a steady pace, but at this point, switching to Neotoma is a lower priority for me than developing a rock-solid core language.

Where's the Code?

If you're interested in checking out the latest Reia codebase, it's available on this branch on Github:

http://github.com/tarcieri/reia/tree/minimalist

If you're looking at master, and wondering why it hasn't been touched in months, it's because I'm hacking on the new branch, not the previous implementation.

The new implementation is not generally usable. I am still working out the nasty details of implementing a compiled eval, as well as implementing cleaner compiler internals.

But hey, if you're interested in Reia, check it out and let me know what you think.

Wednesday, November 4, 2009

RIP "FatELF"

I remember installing Solaris onto a 64-bit UltraSPARC many years ago. When I did it, lo and behold, 32-bit and 64-bit versions of all libraries were installed side-by-side. I could still run the many proprietary 32-bit Solaris apps needed by my coworkers, but we could compile memory-intensive scientific models as 64-bit no problem.

Flash forward to today, and Windows and OS X have both figured this out. Windows 7 uses a similar solution to Solaris, installing both 32-bit and 64-bit versions of all libraries, and having a separate "x86" folder for all programs. OS X uses "universal binaries," which allows builds for multiple architectures to be packaged into the same binary. In either case, everything Just Works and it's great!

Off in Linux land, it's a distribution-by-distribution attempt at solutions for this problem. Some distributions have it figured out, others don't. The solutions chosen by various distributions aren't the same. On some of the more popular distributions there is no Just Works solution to running legacy 32-bit programs on a 64-bit install. Even if you are able to install a base set of 32-bit libraries, installing the many other dependencies of a 32-bit program on a 64-bit system can often be a rather challenging task.

So it was rather disappointing to read that an attempt to add OS X-like universal binary support to Linux, the so-called FatELF executable format, was discontinued today. FatELF offers something Linux desperately needs: a kernel-level solution to the 32/64-bit binary problem, the kind every distribution could automatically fall in line with. The infamous Ulrich Drepper's response to the suggestion of fat binaries for Linux was expectedly blunt:
Yes. It is a "solution" which adds costs in many, many places for a problem that doesn't exist. I don't see why people even spend a second thinking about this.
Yes, Ulrich Drepper, the 32/64-bit binary support problem on Linux is totally and completely solved. It should be no problem to install a 32-bit version of any application on any Linux system today, right? Even if it's running a 64-bit distribution? Yes, that problem does not exist.

Maybe if we all pretend the problem doesn't exist it will go away.