Sunday, November 29, 2009

Reia: now fully compiled (and sometimes JITed)

One of the most frequent questions I get about Reia is its execution model. Is it a simple parse tree walker? Is it interpreted? Is it compiled?

The old branch of Reia made extensive use of the Erlang metacircular interpreter, which is a parse tree walker. Any code within models or classes, however, was compiled to Erlang bytecode. Reia did autodetect HiPE (the Erlang native code compiler/JIT) when available, and would use it when compiling modules/classes.

The new branch of Reia does not make use of the Erlang metacircular interpreter at all. Instead all code, including any code which is eval'd, is translated into Erlang then compiled by the Erlang compiler. This means Reia is 100% fully compiled, and will compile to native code when your Erlang interpreter supports it.

"AutoHiPE" is off by default for now, if only because HiPE has a slightly greater startup time than the normal BEAM interpreter.

HiPE has some additional problems as well. It has limited platform support. x86-64 is not one of the supported platforms. Given that BEAM is fundamentally a register machine you think it'd be ripe for compilation to native code via something like LLVM.

But for now, enjoy native code compilation on the platforms that support it by passing {autohipe, true} as a compiler option.

Wednesday, November 25, 2009

The new Reia: now without rainbow-farting Domo-kuns

Well, what can be said: the buzz about Reia has died down, and several people are confused about the state of the project. Is it going forward? Is it being actively developed? What's this Peridot thing?

I recently attended RubyConf and was asked by many people about the state of Reia. My answer was always "wow, I should really blog about that" to avoid repeating the same things to people over and over (after all, the Ruby community emphasizes DRY, right?). Well, here is that blog.

The State of Reia: Forget Peridot

To put it bluntly, Reia is undergoing a ground-up rewrite. This does not mean I am abandoning the previous codebase. Far from it. Instead I am rewriting the parts of it which are problematic, and pulling in code from the previous implementation where it makes sense.

When I talk to various people who are implementing languages for fun, it seems most people are interested in producing one-off experiments with little attention to developing them into "real" languages someday that might actually be used in production by a lot of people. This certainly makes sense, and that's how I started with Reia. However, buzz grew, and so did my investment in the project. Every indicator I've been given has shown me that Reia is something a lot of people are potentially interested in, so half-assing it isn't going to cut it.

The new Reia roadmap calls for reaching complete feature parity with Erlang with as minimal an implementation as possible, then making it rock solid. At this point, while Reia will lack many of the "dream features" of the previous implementation, it will be generally usable as an alternative to Erlang. Once new language features become available existing programs can be modified to make use of them. After all this is done syntactic sugar can be added, and finally, the concurrent object model.

Initially I had thought of splitting off these additional features into a separate language, which I called "Peridot", but after long and careful consideration, this doesn't make sense. The new Reia will start as an alternative syntax for Erlang (with destructive assignment) but will grow to include all of the features I had originally envisioned.

What Was Wrong with the Previous Implementation?

Why rebuild Reia from the ground up? Can't the existing implementation be salvaged and molded into something usable?

There are a number of problems with the existing implementation. Some stem from my lack of knowledge of Erlang itself when I started. Some of them stem from my lack of knowledge of best practices when writing a language compiler in Erlang. Others stem from the organic way the language evolved. But above everything else, the problems stem from one feature I deemed absolutely necessary: eval.

I like eval lots! If nothing else, it exists for one reason: so Reia can have an interactive shell (i.e. a read-eval-print loop, a.k.a. REPL). I spend lots of my time hacking on Ruby code by interacting with it from irb, the interactive Ruby interpreter. I have a very hard time working with languages that do not provide some form of interactive interpreter nowadays.

The biggest problem with implementing eval is that you have to write your own implementation for your language. In the previous version of Reia I tried to sidestep that by using erl_eval, the Erlang metacircular interpreter, as my eval implementation. Unfortunately, to facilitate this, I ended up implementing the entire code loading process in a way which shoved everything to erl_eval. The result ended up looking something like this:

the previous wonky ass Reia compiler

When code entered the system, it was first parsed and converted to Erlang by reia_compiler (the Domo-kuns). For module and class declarations, the code was compiled down to Erlang bytecode (the rainbow farts) which were in turn injected into the toplevel Erlang AST. In other words, the toplevel scope of any Reia file was never compiled, but simply stored as expressions, and where module/class declarations existed, instructions to load the compiled module (which itself was injected directly into the AST) were issued. This provides somewhat Ruby-like semantics for module/class declarations: they're "loaded" at the time they're declared.

The resulting Erlang AST, complete with compiled class/module fragments, was then shoved through the Erlang metacircular interpreter, erl_eval (seen in the diagram as the tornado). As you might guess, compared to compiled Erlang the metacircular interpreter is reaaaaaaaaally slow.

Once everything was said and done, the resulting class/modules were loaded into the Erlang code server, pictured here as a hungry Joe Armstrong going *nom* *nom* *nom*.

Making Reia Faster


As you may guess, an as-yet-unstated goal of this rewrite is to improve the speed of code-loading. Previously, Reia could not have a large standard library, because it took so long to load code. Furthermore, implementing a mechanism for caching compiled bytecode was impossible due to the API structure.

The new code-loading API directly compiles everything, including any code which is eval'd. This not only makes everything significantly faster but also facilitates the possibility of caching and also various bugs surrounding the eval implementation. From what I've gathered elsewhere, most compiled languages generally ditch any form of metacircular interpreter and implement eval by compiling temporary modules.

Doing this in Erlang is hard, because certain expressions in Erlang create things which exist beyond when code is being evaluated, namely processes and lambdas (a.k.a. funs). This was a vexing problem to me for quite some time, but after talking with Robert "Hello Robert" Virding, one of Erlang's creators, I believe I've come upon a workable solution, even if it's a bit of a hack.

Reia will implement its own "garbage collector" process for eval'd code, which periodically checks if all the lambdas/processes created by a particular eval call are no longer in use. If so, it will remove the temporary module. If not, then it will continue to wait. It is not the greatest solution in the world, but it will get the job done.

This means no Reia code will ever go through erl_eval. Everything is compiled. This will make code loading of all sorts, and eval, much faster. There are no longer any rainbow farting Domo-kuns.

What About Neotoma?

When I originally began my rewrite of Reia, I was attempting to redo the parser using Neotoma, a Parsing Expression Grammar (PEG) tool for Erlang, similar to Ruby's Treetop.

I eventually shied away. This had little to do with Neotoma itself, but my own inability to understand PEGs, and the fact that my own inability to understand them was a roadblock in continued development. Because of this, I switched back to more familiar tools: leex and yecc, the Erlang equivalents of lex and yacc.

Neotoma has come a long way and become better than ever. I am still considering using it. I think it would be a great tool for solving a lot of problems that aren't presently solved, like handling Reia expressions within interpolated strings. This is something I might pursue when I am confident that development otherwise is coming along at a steady pace, but at this point, switching to Neotoma is a lower priority for me than developing a rock-solid core language.

Where's the Code?

If you're interested in checking out the latest Reia codebase, it's available on this branch on Github:

http://github.com/tarcieri/reia/tree/minimalist

If you're looking at master, and wondering why it hasn't been touched in months, it's because I'm hacking on the new branch, not the previous implementation.

The new implementation is not generally usable. I am still working out the nasty details of implementing a compiled eval, as well as implementing cleaner compiler internals.

But hey, if you're interested in Reia, check it out and let me know what you think.

Wednesday, November 4, 2009

RIP "FatELF"

I remember installing Solaris onto a 64-bit UltraSPARC many years ago. When I did it, lo and behold, 32-bit and 64-bit versions of all libraries were installed side-by-side. I could still run the many proprietary 32-bit Solaris apps needed by my coworkers, but we could compile memory-intensive scientific models as 64-bit no problem.

Flash forward to today, and Windows and OS X have both figured this out. Windows 7 uses a similar solution to Solaris, installing both 32-bit and 64-bit versions of all libraries, and having a separate "x86" folder for all programs. OS X uses "universal binaries," which allows builds for multiple architectures to be packaged into the same binary. In either case, everything Just Works and it's great!

Off in Linux land, it's a distribution-by-distribution attempt at solutions for this problem. Some distributions have it figured out, others don't. The solutions chosen by various distributions aren't the same. On some of the more popular distributions there is no Just Works solution to running legacy 32-bit programs on a 64-bit install. Even if you are able to install a base set of 32-bit libraries, installing the many other dependencies of a 32-bit program on a 64-bit system can often be a rather challenging task.

So it was rather disappointing to read that an attempt to add OS X-like universal binary support to Linux, the so-called FatELF executable format, was discontinued today. FatELF offers something Linux desperately needs: a kernel-level solution to the 32/64-bit binary problem, the kind every distribution could automatically fall in line with. The infamous Ulrich Drepper's response to the suggestion of fat binaries for Linux was expectedly blunt:
Yes. It is a "solution" which adds costs in many, many places for a problem that doesn't exist. I don't see why people even spend a second thinking about this.
Yes, Ulrich Drepper, the 32/64-bit binary support problem on Linux is totally and completely solved. It should be no problem to install a 32-bit version of any application on any Linux system today, right? Even if it's running a 64-bit distribution? Yes, that problem does not exist.

Maybe if we all pretend the problem doesn't exist it will go away.