Sunday, December 13, 2009

Reia: now with "Magic Rebinding"

In Ruby, thanks to its first class syntax for "hashes" and mutable state, it's quite easy to do:
h = {}
h[key] = value
The equivalent code in Erlang is noisier, thanks to immutable state and single assignment:
Dict1 = dict:new(),
Dict2 = dict:store(Key, Value, Dict1).
Since Reia lacks mutable state, it never before had syntax as simple as Ruby's... but now it does!

I have been trying to hold off on adding syntactic sugar like this to my new "minimalist" branch of Reia. However, this is a feature I meant to add to the old implementation, and tried to retrofit it in long after the implementation had grown quite complex, never managing to succeed. I decided to tackle it now, and I'm happy to announce that it works! Furthermore, it can be used in complex pattern matching expressions:
>> m = {}
=> {}
>> (m[:foo], m[:bar], m[:baz]) = (1,2,3)
=> (1,2,3)
>> m
=> {:bar=>2,:baz=>3,:foo=>1}
So what is going on here exactly? Reia is an immutable state language, so surely I'm not mutating the value that "m" references.

In these cases, Reia is "altering" the local variable binding. Each time you change a member of a map ("hash" for you Ruby folks, "dict" for you Erlang folks), a new version of that map is calculated, then bound to "m". Behind the scenes, the Reia compiler is translating these calls into destructive assignments.

Maps, Tuples, and even Lists now support assignments in this way (although Lists only for consistency's sake... I hate to think of people actually setting values in lists by index). Tuples and Lists even support Ruby-style negative indexing:
>> t = (1,2,3)
=> (1,2,3)
>> t[-1] = 42
=> 42
>> t
=> (1,2,42)
I plan on eventually exposing this functionality to user-defined types as well, in the form of "bang methods" on immutable objects. Users of Ruby are likely familiar with them:
>> arr = [1,2,3]
=> [1,2,3]
>> arr.reverse; arr
=> [1,2,3]
>> arr.reverse!; arr
=> [3,2,1]
Here you can see that calling the "reverse" method on an array (without the !) does not modify the original array in-place. Instead, it returns a new array in reverse order. The complimentary "reverse!" method performs an in-place modification of the array.

The "method!" idiom in Ruby is generally used to indicate methods that modify their receivers as opposed to versions which do not. However this is not a solid requirement, and "!" is often added to any methods considered "dangerous". There's no specific meaning to putting "!" on the end of a method and certainly nothing Ruby does differently at the language level.

In Reia, "bang methods" will be a first class language construct, and will always rebind the receiver with the return value of the method. This will provide a simple way to allow "in place" modifications of immutable objects, by having "bang methods" create and return a new immutable object.

It's the best of both worlds: the ease of use that comes from mutable state, with the concurrency benefits of being completely immutable.

Sunday, November 29, 2009

Reia: now fully compiled (and sometimes JITed)

One of the most frequent questions I get about Reia is its execution model. Is it a simple parse tree walker? Is it interpreted? Is it compiled?

The old branch of Reia made extensive use of the Erlang metacircular interpreter, which is a parse tree walker. Any code within models or classes, however, was compiled to Erlang bytecode. Reia did autodetect HiPE (the Erlang native code compiler/JIT) when available, and would use it when compiling modules/classes.

The new branch of Reia does not make use of the Erlang metacircular interpreter at all. Instead all code, including any code which is eval'd, is translated into Erlang then compiled by the Erlang compiler. This means Reia is 100% fully compiled, and will compile to native code when your Erlang interpreter supports it.

"AutoHiPE" is off by default for now, if only because HiPE has a slightly greater startup time than the normal BEAM interpreter.

HiPE has some additional problems as well. It has limited platform support. x86-64 is not one of the supported platforms. Given that BEAM is fundamentally a register machine you think it'd be ripe for compilation to native code via something like LLVM.

But for now, enjoy native code compilation on the platforms that support it by passing {autohipe, true} as a compiler option.

Wednesday, November 25, 2009

The new Reia: now without rainbow-farting Domo-kuns

Well, what can be said: the buzz about Reia has died down, and several people are confused about the state of the project. Is it going forward? Is it being actively developed? What's this Peridot thing?

I recently attended RubyConf and was asked by many people about the state of Reia. My answer was always "wow, I should really blog about that" to avoid repeating the same things to people over and over (after all, the Ruby community emphasizes DRY, right?). Well, here is that blog.

The State of Reia: Forget Peridot

To put it bluntly, Reia is undergoing a ground-up rewrite. This does not mean I am abandoning the previous codebase. Far from it. Instead I am rewriting the parts of it which are problematic, and pulling in code from the previous implementation where it makes sense.

When I talk to various people who are implementing languages for fun, it seems most people are interested in producing one-off experiments with little attention to developing them into "real" languages someday that might actually be used in production by a lot of people. This certainly makes sense, and that's how I started with Reia. However, buzz grew, and so did my investment in the project. Every indicator I've been given has shown me that Reia is something a lot of people are potentially interested in, so half-assing it isn't going to cut it.

The new Reia roadmap calls for reaching complete feature parity with Erlang with as minimal an implementation as possible, then making it rock solid. At this point, while Reia will lack many of the "dream features" of the previous implementation, it will be generally usable as an alternative to Erlang. Once new language features become available existing programs can be modified to make use of them. After all this is done syntactic sugar can be added, and finally, the concurrent object model.

Initially I had thought of splitting off these additional features into a separate language, which I called "Peridot", but after long and careful consideration, this doesn't make sense. The new Reia will start as an alternative syntax for Erlang (with destructive assignment) but will grow to include all of the features I had originally envisioned.

What Was Wrong with the Previous Implementation?

Why rebuild Reia from the ground up? Can't the existing implementation be salvaged and molded into something usable?

There are a number of problems with the existing implementation. Some stem from my lack of knowledge of Erlang itself when I started. Some of them stem from my lack of knowledge of best practices when writing a language compiler in Erlang. Others stem from the organic way the language evolved. But above everything else, the problems stem from one feature I deemed absolutely necessary: eval.

I like eval lots! If nothing else, it exists for one reason: so Reia can have an interactive shell (i.e. a read-eval-print loop, a.k.a. REPL). I spend lots of my time hacking on Ruby code by interacting with it from irb, the interactive Ruby interpreter. I have a very hard time working with languages that do not provide some form of interactive interpreter nowadays.

The biggest problem with implementing eval is that you have to write your own implementation for your language. In the previous version of Reia I tried to sidestep that by using erl_eval, the Erlang metacircular interpreter, as my eval implementation. Unfortunately, to facilitate this, I ended up implementing the entire code loading process in a way which shoved everything to erl_eval. The result ended up looking something like this:

the previous wonky ass Reia compiler

When code entered the system, it was first parsed and converted to Erlang by reia_compiler (the Domo-kuns). For module and class declarations, the code was compiled down to Erlang bytecode (the rainbow farts) which were in turn injected into the toplevel Erlang AST. In other words, the toplevel scope of any Reia file was never compiled, but simply stored as expressions, and where module/class declarations existed, instructions to load the compiled module (which itself was injected directly into the AST) were issued. This provides somewhat Ruby-like semantics for module/class declarations: they're "loaded" at the time they're declared.

The resulting Erlang AST, complete with compiled class/module fragments, was then shoved through the Erlang metacircular interpreter, erl_eval (seen in the diagram as the tornado). As you might guess, compared to compiled Erlang the metacircular interpreter is reaaaaaaaaally slow.

Once everything was said and done, the resulting class/modules were loaded into the Erlang code server, pictured here as a hungry Joe Armstrong going *nom* *nom* *nom*.

Making Reia Faster


As you may guess, an as-yet-unstated goal of this rewrite is to improve the speed of code-loading. Previously, Reia could not have a large standard library, because it took so long to load code. Furthermore, implementing a mechanism for caching compiled bytecode was impossible due to the API structure.

The new code-loading API directly compiles everything, including any code which is eval'd. This not only makes everything significantly faster but also facilitates the possibility of caching and also various bugs surrounding the eval implementation. From what I've gathered elsewhere, most compiled languages generally ditch any form of metacircular interpreter and implement eval by compiling temporary modules.

Doing this in Erlang is hard, because certain expressions in Erlang create things which exist beyond when code is being evaluated, namely processes and lambdas (a.k.a. funs). This was a vexing problem to me for quite some time, but after talking with Robert "Hello Robert" Virding, one of Erlang's creators, I believe I've come upon a workable solution, even if it's a bit of a hack.

Reia will implement its own "garbage collector" process for eval'd code, which periodically checks if all the lambdas/processes created by a particular eval call are no longer in use. If so, it will remove the temporary module. If not, then it will continue to wait. It is not the greatest solution in the world, but it will get the job done.

This means no Reia code will ever go through erl_eval. Everything is compiled. This will make code loading of all sorts, and eval, much faster. There are no longer any rainbow farting Domo-kuns.

What About Neotoma?

When I originally began my rewrite of Reia, I was attempting to redo the parser using Neotoma, a Parsing Expression Grammar (PEG) tool for Erlang, similar to Ruby's Treetop.

I eventually shied away. This had little to do with Neotoma itself, but my own inability to understand PEGs, and the fact that my own inability to understand them was a roadblock in continued development. Because of this, I switched back to more familiar tools: leex and yecc, the Erlang equivalents of lex and yacc.

Neotoma has come a long way and become better than ever. I am still considering using it. I think it would be a great tool for solving a lot of problems that aren't presently solved, like handling Reia expressions within interpolated strings. This is something I might pursue when I am confident that development otherwise is coming along at a steady pace, but at this point, switching to Neotoma is a lower priority for me than developing a rock-solid core language.

Where's the Code?

If you're interested in checking out the latest Reia codebase, it's available on this branch on Github:

http://github.com/tarcieri/reia/tree/minimalist

If you're looking at master, and wondering why it hasn't been touched in months, it's because I'm hacking on the new branch, not the previous implementation.

The new implementation is not generally usable. I am still working out the nasty details of implementing a compiled eval, as well as implementing cleaner compiler internals.

But hey, if you're interested in Reia, check it out and let me know what you think.

Wednesday, November 4, 2009

RIP "FatELF"

I remember installing Solaris onto a 64-bit UltraSPARC many years ago. When I did it, lo and behold, 32-bit and 64-bit versions of all libraries were installed side-by-side. I could still run the many proprietary 32-bit Solaris apps needed by my coworkers, but we could compile memory-intensive scientific models as 64-bit no problem.

Flash forward to today, and Windows and OS X have both figured this out. Windows 7 uses a similar solution to Solaris, installing both 32-bit and 64-bit versions of all libraries, and having a separate "x86" folder for all programs. OS X uses "universal binaries," which allows builds for multiple architectures to be packaged into the same binary. In either case, everything Just Works and it's great!

Off in Linux land, it's a distribution-by-distribution attempt at solutions for this problem. Some distributions have it figured out, others don't. The solutions chosen by various distributions aren't the same. On some of the more popular distributions there is no Just Works solution to running legacy 32-bit programs on a 64-bit install. Even if you are able to install a base set of 32-bit libraries, installing the many other dependencies of a 32-bit program on a 64-bit system can often be a rather challenging task.

So it was rather disappointing to read that an attempt to add OS X-like universal binary support to Linux, the so-called FatELF executable format, was discontinued today. FatELF offers something Linux desperately needs: a kernel-level solution to the 32/64-bit binary problem, the kind every distribution could automatically fall in line with. The infamous Ulrich Drepper's response to the suggestion of fat binaries for Linux was expectedly blunt:
Yes. It is a "solution" which adds costs in many, many places for a problem that doesn't exist. I don't see why people even spend a second thinking about this.
Yes, Ulrich Drepper, the 32/64-bit binary support problem on Linux is totally and completely solved. It should be no problem to install a 32-bit version of any application on any Linux system today, right? Even if it's running a 64-bit distribution? Yes, that problem does not exist.

Maybe if we all pretend the problem doesn't exist it will go away.

Saturday, July 18, 2009

A new direction for Reia: Peridot?

I've always viewed Reia as what I hoped to become a spiritual successor for Ruby in the same way that Ruby was the spiritual successor of Perl. Talking with one of my roommates he pointed out that pearls and rubies are the birthstones of June and July respectively, so an interesting name for the spiritual successor of Ruby would be Peridot. If I had the chance to do it all over again, I'd probably would've named Reia as Peridot instead, but as it stands I've already built up a decent degree of mindshare around "Reia" so renaming the language probably isn't practical. I'll come back to Peridot in a bit.

When I first started using Erlang one of the first things I wanted to do was give it a Ruby-like syntax. It seems like two of the biggest reasons people starting out in Erlang reject it is because of the ugly syntax and single assignment. Lately I've been wondering if there would be value in a language which is semantically identical to Erlang except with destructive assignment and a Ruby-like syntax. Lisp Flavored Erlang has seen a lot of interest and is a much simpler undertaking than Reia because it merely provides an alternative syntax and doesn't try to add new and complex semantics to the language. Perhaps there's a niche for a "Ruby-flavored Erlang" which provides a Ruby-like syntax, destructive assignment, and possibly a bit of syntactic sugar while preserving the underlying semantics of Erlang and not trying to add anything new.

With Reia in its current form I feel like I've bit off a bit more than I can chew. Worse, for the past few months I've been stuck on a particularly difficult problem and also very busy. I feel like perhaps I've bit off a bit more than I can chew implementing Reia, and some bad decisions in the initial compiler design plus my frustration with Erlang syntax have left me wanting to rewrite the compiler as a self-hosted implementation. But I don't think Reia as a language is ready for that yet.

Another thing that has popped onto the scene is neotoma, a Parsing Expression Grammar-based parser generator for Erlang. Ever since I began implementing interopolated strings in Reia I have longed for something like this. I have hacked and kludged my way along implementing interpolating with leex and yecc, but a PEG would solve the issue completely and allow for nested interpolated strings of the sort Ruby supports. This has left me wanting to rewrite the scanner/parser for Reia using neotoma instead.

So what to do? How should I proceed? The idea of a simpler Reia has certainly been bouncing around in my head for awhile. I am seriously thinking of reinventing Reia as something more like Erlang, then continuing on to add things like an object system in a new language: Peridot. There are a number of interesting things this would allow. First, Reia would effectively provide a subset of what's in Peridot, and most Reia programs would be valid Peridot programs. In that regard, Reia would work something like RPython and would make a great bootstrap language for implementing Peridot. A reduced Reia would be much easier to get to production quality than one which incorporates all of the elaborate features I've currently tried to implement. And it would once and for all put to rest the complaints about Erlang syntax and single assignment.

I'm interested to hear what people think about this proposal.

Thursday, June 4, 2009

Dear Viacom: You're Doing It Wrong

I've been very excited about the upcoming release of The Beatles: Rock Band after hearing about it earlier this week. It's the first PS3 game sold on disc I'm going to snap up since, well, Rock Band 2. Beyond the simple fact that it's Rock Band loaded with Beatles music (and I love the Beatles), I've been rather impressed by the visual style, particularly this intro video directed by Pete Candeland of Gorillaz music video fame:


Hopefully this video hasn't been flagged for a copyright violation by the time you read this post. It rules

However, I was rather surprised to see that a particular copy of the intro video was flagged for removal from YouTube due to a copyright violation. Go ahead, hit play, I dare ya:



Yes, Viacom has decided that they want to forego free advertising for their upcoming video game in order to defend their copyright. WTF? Somebody doesn't get it. Viacom, this isn't someone infringing your copyright. This is someone providing you with a viral marketing campaign for free. You are effectively telling them: "no, don't advertise our product for free. We don't want that"

I forsee a long uphill battle until old media companies finally realize that viral video distribution is actually a good thing. Eventually they'll be drug kicking and screaming to the realization that piracy is good.

Tuesday, June 2, 2009

Ubuntu's Jackalope not so Jaunty

I'm not one to write reviews of things like desktop Linux systems typically. In fact, any of you who read my blog for Reia should probably just stop now. But I just tried desktop Linux for the first time in two years, and my experience was anything but pleasurable.

My Background (a.k.a. chance to be a blowhard)

For the past several years OS X has been my desktop of choice. I get a beautiful, slickly animated GUI interface, seamless 3D compositing of all UI elements, nifty commercial software, and Unix underpinnings. Sure, it's proprietary, but I don't give a crap.

That said, I am no stranger to desktop Linux. My first desktop Linux experience was using FVWM on a Slackware 2.3 system back in 1995. So yes: I'm one of those Linux users that survived the transition from a.out to ELF and from libc5 to glibc. I'd try a few different distributions, next RedHat and finally Debian before becoming a Debian person. I tried RedHat 5.0, when they made the switch to glibc, and it was such an unmitigated disaster I destroyed the install CD (purchased from a store) out of rage. That is the lowest low I think I've ever seen Linux reach.

I remember trying out an early Enlightenment, which leaked memory so quickly it completely consumed the 16MB of RAM I had installed at the time. Eventually I would discover WindowMaker, which would be my standby window manager for years to come. I flitted about with OS choices after that, running FreeBSD as my primary desktop for quite some time.

Around 2001 I discovered the Synergy software which lets you seamlessly share a keyboard and mouse across two computers. From then on I loved running two computers, typically one with Windows and one with my *IX du jour. This has remained my standard configuration for quite some time.

Around 2006 I was given a new monitor for work, with a strange 1680x1050 resolution. I was running Debian at the time, ripping my hair out hand editing my X config trying to get it to work properly. I could not for the life of me figure out what was wrong, and this was after spending 5 years as a Linux sysadmin. I decided to give Ubuntu a go. I stuck in the install CD, and BEHOLD it booted straight into X and my monitor was automagically configured to the right resolution! I was awestruck.

I'd been against Gnome for years, but by now it seemed almost downright usable. I actually liked having things like desktop icons! It was pretty nifty.

However, shortly thereafter I would buy a MacBook and ditch desktop Linux entirely. I've been running an OS X/Windows Synergy setup ever since (although now I use OS X exclusively at home)

Fast Forward to Today

Amidst many of my coworkers setting up their computers to dual boot Windows and Linux, I figured I'd do the same. I thought it'd be pretty nifty to have OS X one one computer (which would remain my primary development computer) and Linux on the other.

First I installed Windows, which wasn't without its hiccups but when I was done I was left with a 30GB Windows partition and 220GB free for Linux.

I threw in the Jaunty Jackalope CD one of my coworkers had and started up the graphical installer. I missed the good old text-based Debian installer I had used for over a decade, but hey, it's the 21st century, nothing wrong with graphics, right?

I got to the partitioning step. Now, I've dealt with some pretty bad graphical partition managers in the past. Solaris's was particularly atrocious. At first glance Ubuntu's seemed fine... it recognized I had an NTFS volume and offered me the option to "Install Windows and Linux side by side". I figured this was such a common use case it would just naturally know the right thing to do.

So, I click OK and it pops up a modal dialog asking me if I want to resize my NTFS partition. WTF? Resize my NTFS partition? NO! You have 220GB of free space to work with there, why would you resize my NTFS partition? It offered three buttons I could click: one that said "Go Back" which was highlighted (and I guess the one I was supposed to choose), one that said continue/proceed or something, and the traditional "X" in the corner of the modal dialog window to close it. I clicked the latter, which was the wrong decision.

It then popped up another modal dialog window, informing me it was resizing my NTFS partition to fill the entire disk.



Seriously, are you kidding me? Closing a modal dialog window with the "X" button does NOT MEAN I WANT YOU TO PERFORM A DESTRUCTIVE ACTION. And furthermore, who installs Linux and wants it to resize their Windows partition to eat up the entire disk? Frustrated, I opened up a terminal, started gparted, shrank my Windows partition back down to 30GB, and rebooted to try again fresh.

This time I chose to manually partition my disk (although I still longed for cfdisk) and things seemed to go a little more smoothly for awhile.

After I booted into X for the first time I was prompted to install updates. I hit the "Check" button which prompted for a password. It downloaded a list of updates. I hit "Install Updates". Nothing happened. I clicked it again, some 30 times. Nothing. The button depressed, and that was it. I called a Linux-loving coworker over, he looked at it and shrugged. "I don't use the graphical updater". Yes, popping open a terminal and typing apt-get upgrade was seeming like the way to go here. I clicked "Check" again then "Install Updates". Magically it worked this time.

After all was said and done, my display was not at the right resolution. It popped up a little notification prompting me to install restricted drivers. I installed the nVidia drivers, which completed successfully, then tried to open my display preferences.

An error dialog popped up, saying that display preferences couldn't be launched, and I need to run my vendor tool instead. It said I could hit OK to do so, but when I did, it said there was an error launching the vendor tool, and I needed to run a particular command from the command line.

Are you kidding me? At this point I'm seriously confused: who is Ubuntu targeting? When was the last time Windows or OS X asked me from a modal dialog to pop open a terminal and type some shit on the command line? I tried running the command and got yet another error. Frustrated I rebooted.

Now when I try to go to the display preferences, at first I get an error saying I need to run the vendor tool, and then it launches the nVidia preferences. Only... the native resolution of my monitor is not listed. All of them are below the native resolution of my monitor.

I thought: oh well, I'll just edit my xorg.conf by hand. So I did. And I rebooted. I was still at the same resolution, and the changes I made to my xorg.conf were clobbered by something. I don't know. I reopened the file and they were completely gone. What process overwrote it? I don't have a freaking clue. I remember when Linux gave you a sense of control over what you're doing, but now I feel powerless.

Now, insult to injury: this is the exact same monitor which in 2006 I stuck an Ubuntu install CD into my computer and it launched straight into X at the native resolution. I didn't have to install any restricted drivers. I stuck in the CD and it just worked.

3 years ago Ubuntu had me excited that maybe, finally, desktop Linux was reaching a level of general usability. Now here I am, a power user, and I've run into seemingly intractable problems I can't solve.

Pre-Emptive Anti-Zealot Repellant

Did I go onto forums and ask about my problems? Did I post bugs on Ubuntu's trackers? No. Know what I did? I rebooted into Windows. And here I think I will stay. I freshly installed Windows the same day and had it up and running to my satisfaction in a few hours. Windows is working and I am happy, therefore I don't feel the need to try to help Ubuntu sort out their mess.

All I can say is, without a doubt, Ubuntu 9.04 represents a rather severe regression from my previous experience with using Ubuntu on a desktop. We still continue to run it on our servers at my place of employment and there I have few complaints. However, at this point I cannot see myself running it on a desktop, and worse I've lost my sense that desktop Linux is actually getting better over time.

Thursday, May 28, 2009

Reia Presentations

I've given a few presentations on Reia lately. The first was at last month's Erlang Factory conference, where my presentation included a short talk on building languages on top of Erlang, followed by a talk on Reia geared towards Erlang programmers. About a month later I gave a talk on Reia at my local Ruby group, focused on Rubyists.

In case anyone is interested in these talks, I've posted the on Google Docs and made PDF versions of them available.

Here's my talk at Erlang Factory, ostensibly geared at an Erlang-focused audience:


PDF version available here

And here's the same talk, slightly modified for Rubyists, and given at the boulder.rb group:


PDF version available here

Monday, May 11, 2009

Happy 1st Birthday, Reia!

Reia turns one year old today, as measured by the first commit to the Reia repository (well, not literally, there was one commit of the leex code before it which isn't mine). Later that day I'd begin fleshing out the scanner (with leex) and the initial grammar (with yecc). I'd been playing around with leex and yecc before but this was the first time I really tried to put it together into a coherent language, one which could at least calculate things with C/Algol-style syntax.

Things have come a long way from then! I never would've thought I'd give a talk about Reia before it was even a year old. I've noticed most language designers like to tinker in obscurity for a few years before releasing their creations to the world. I've instead just thrown it up on github and let anybody who wants poke around with it and send me patches.

In the past year, I've managed to implement the following features in Reia:
  • Ruby-like syntax with destructive assignment
  • Rich set of "builtin" types including integers, floats, strings, atoms, tuples, lists, maps (i.e. dicts), "binaries", regexes, funs (i.e. lambdas), process pids, and constants (i.e. module/class names)
  • Pattern matching
  • Asynchronous object system based on Erlang processes with single inheritence
  • Erlang-style processes (i.e. Actors)
  • Ruby-style blocks
  • Function/method references via Python's receiver.method syntax (implemented as funs)
  • List comprehensions
  • (Almost) pure expression grammar, allowing modules and classes to be defined on the shell
  • Self-hosted test system for the language itself
  • Interpolated strings
There are many future features on the roster. Stay tuned and I'll have a series of follow up blog posts on what I'd like to add.

Monday, May 4, 2009

Erlang Factory: A Retrospective

I recently presented on Reia at Erlang Factory in San Francisco. It was a lot of fun, both attending and presenting. Videos of my talk, and the many others I heard people raving about over IRC and Twitter are forthcoming but will hopefully be available soon. There are certainly many I intend to check out.

The conference certainly exceeded my expectations. I thought I might be something of an outsider in the community, but I was surprised to see a number of Ruby people in the community, which I found pretty interesting. Ezra's talk on Nanite touched on an important point: there's not a lot of crossover in what Ruby and Erlang do well, and for that reason they're relatively compilentary languages. I ended up pitching Reia as what I hoped to be a best-of-both-worlds solution.

There was certainly ample talk about CouchDB, which got me thinking about using Reia as a language for CouchDB views. CouchDB creator Damien Katz gave a pretty cool talk as well, not so much about code as about the personal circumstances which brought him to create CouchDB.

The Powerset guys talk on Katamari, the evolution of Fuzed, was pretty cool to catch as well. I've certainly run into the issue of needing an intelligent proxy in front of a bunch of slow web services, which so far HAProxy has managed to solve, but if I need a more intelligent frontend to the services of our application Katamari is something I certainly intend to check out, at least as soon as it's open sourced.

It was pretty cool to check out Nitrogen as well, which managed to do some pretty impressive and Erlangy things with web services. It's also the best usage of Erlang's record syntax I've ever seen. The demo involved pushing the presently active slide to all the people viewing the demo in their web browser, which I've seen done in Ruby before but I suspect the Erlang version is far less hackish :)

Twitter seemed to be haunting me throughout the conference. I hacked together a script which posted Twitter updates with the #erlangfactory tag to the #erlangfactory channel on freenode. Twitter helped me link up with the people who found the lost AC adapter to my MacBook and also helped me figure out who's Apple remote I accidently snagged after my talk. It was really handy.

However, people also talked to me about my blog post regarding Twitter's switch to Scala, and we all talked about how simple the core problem Twitter is trying to solve actually is and how easily it could be (and has been) implemented in Erlang. The Twitter people did their "due diligence" and decided that Scala was best suited to their needs. Whatever guys, I think Erlang could've help you out considerably. I recall a post on a site like HighScalability (although I can't find it) where Twitter said they evaluated Erlang circa 2007 and the lone developer pitching it couldn't get his prototype to work, and so I guess Erlang isn't applicable there, or something? When the Scala prototype works I guess that's what they use...

The general concensus discussing the matter with various Erlangy people was that Twitter was little more than a massive pub/sub message queue which delivers messages to a backing database (from whence their RoR webapp serves their site), which isn't really that hard of a problem and the kind of problem which is almost ideally suited to Erlang. Yet the Twitter people eschewed Erlang for Scala and on a totally unrelated matter run a perpetually unstable albeit massively hyped service. Way to go, guys.

There's quite a few talks I missed which I want to talk out, particularly a talk given on Haskell. I certainly hope the videos (including mine) get posted soon.

Despite ample conversation about CouchDB there was virtually no discussion of the sort that has Ruby drama queens panties in a bunch, something about a presentation involving scantily clad girls that offended people. So awesome to be in a community that's still small enough to value code over anything else :)

All said it was a great conference and one I look forward to attending (or speaking at again) next year.

Monday, April 6, 2009

Why I don't like Scala

Scala is a hybrid functional/imperative object/actor language for the Java Virtual Machine which has recently gained notoriety by Twitter selecting it as the basis of their future development. My language Reia is also a hybrid functional/imperative object/actor language. You may think given these similarities I would like Scala... but I don't.

I originally tried Scala back in 2007, shortly after I started becoming proficient in Erlang. I was extremely interested in the actor model at the time, and Scala provides an actor model implementation. Aside from Erlang, it was one of the only languages besides Scheme and Io I had discovered which had attempted an actor model implementation, and Scala's actor support seemed heavily influenced by Erlang (even at the syntactic level). I was initially excited about Scala but slowly grew discontented with it.

At first I enjoyed being able to use objects within the sequential portions of my code and actors for the concurrent parts. This is an idea I would carry over into Ruby with Revactor library, an actor model implementation I created for Ruby 1.9. Revactor let you write sequential code using normal Ruby objects, while using actors for concurrency. Like Scala, Revactor was heavily influenced by Erlang, to the point that I had created an API almost virtually identical translation of Erlang's actor API to Ruby as MenTaLguY had in his actor library called Omnibus. MenTaLguY is perhaps the most concurrency-aware Ruby developer I have ever met, so I felt I may be on to something.

However, rather quickly I discovered something about using actors and objects in the same program: there was considerable overlap in what actors and objects do. More and more I found myself trying to reinvent objects with actors. I also began to realize that Scala was running into this problem as well. There were some particularly egregious cases. What follows is the most insidious one.

The WTF Operator

One of the most common patterns in actor-based programs is the Remote Procedure Call or RPC. As the actor protocol is asynchronous, RPCs provide synchronous calls in the form of two asynchronous messages, a request and a response.

RPCs are extremely prevalent in actor-based programming, to the point that Joe Armstrong, creator of Erlang, says:
95% of the time standard synchronous RPCs will work - but not all the
time, that's why it's nice to be able to open up things and muck around at the
message passing level.
Seeing RPCs as exceedingly common, the creators of Scala created an operator for it: "!?"

WTF?! While it's easy to poke fun at an operator that resembles an interrobang, the duplicated semantics of this operator are what I dislike. To illustrate the point, let me show you some Scala code:

response = receiver !? request

and the equivalent code in Reia:

response = receiver.request()

Reia can use the standard method invocation syntax because in Reia, all objects are actors. Scala takes an "everything is an object" approach, with actors being an additional entity which duplicates some, but not all, of the functions of objects. In Scala, actors are objects, whereas in Reia objects are actors.

Scala's object model borrows heavily from Java, which is in turn largely inspired by C++. In this model, objects are effectively just states, and method calls (a.k.a. "sending a message to an object") are little more than function calls which act upon and mutate those states.

Scala also implements the actor model, which is in turn inspired by Smalltalk and its messaging-based approach to object orientation. The result is a language which straddles two worlds: objects acted upon by function calls, and actors which are acted upon by messages.

Furthermore, Scala's actors fall prey Clojure creator Rich Hickey's concerns about actor-based languages:
It reduces your flexibility in modeling - this is a world in which everyone sits in a windowless room and communicates only by mail. Programs are decomposed as piles of blocking switch statements. You can only handle messages you anticipated receiving. Coordinating activities involving multiple actors is very difficult. You can't observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.
Reia offers a solution to this problem with its objects-as-actors approach: all actor-objects speak a common protocol, the "Object" protocol, and above that, they speak whatever methods belong to their class. Objects implicitly participate in the same actor protocol, because they all inherit the same behavior from their common ancestor.

Scala's actors... well... if you !? them a message they aren't explicitly hardcoded to understand (and yes nitpickers, common behaviors can be abstracted into functions) they will ?! at your message and ignore it.

Two kinds of actors?

One of the biggest strengths of the Erlang VM is its approach to lightweight concurrency. The Erlang VM was designed from the ground up so you don't have to be afraid of creating a large number of Erlang processes (i.e. actors). Unlike the JVM, the Erlang VM is stackless and therefore much better at lightweight concurrency. Erlang's VM also has advanced mechanisms for load balancing its lightweight processes across CPU cores. The result is a system which lets you create a large number of actors, relying on the Erlang virtual machine to load balance them across all the available CPU cores for you.

Because the JVM isn't stackless and uses native threads as its concurrency mechanism, Scala couldn't implement Erlang-style lightweight concurrency, and instead compromises by implementing two types of actors. One type of actor is based on native threads. However, native threads are substantially heavier than a lightweight Erlang process, limiting the number of thread-based actors you can create in Scala as compared to Erlang. To address this limitation, Scala implements its own form of lightweight concurrency in the form of event-based actors. Event-based actors do not have all the functionality of thread-based actors but do not incur the penalty of needing to be backed by a native thread.

Should you use a thread-based actor or an event-based actor in your Scala program? This is a case of implementation details (namely the JVM's lack of lightweight concurrency) creeping out into the language design. Projects like Kilim are trying to address lightweight concurrency on the JVM, and hopefully Scala will be able to leverage such a project in the future as the basis of its actor model and get rid of the threaded/evented gap, but for now Scala makes you choose.

Scala leaves you with three similar, overlapping constructs to choose from when modeling state, identity, and concurrency in programs: objects, event-based actors, and thread-based actors. Which should you choose?

Reia provides both objects and actors, but actors are there for edge cases and intended to be used almost exclusively by advanced programmers. Reia introduces a number of asynchronous concepts into its object model, and for that reason objects alone should suffice for most programmers, even when writing concurrent programs.

Advantages of Scala's approach

Reia's approach comes with a number of disadvantages, despite the theoretical benefits I've outlined above. For starters, Scala is a language built on the JVM, which is arguably the best language virtual machine available. Scala's sequential performance tops Erlang even if its performance in concurrent benchmarks typically lags behind.

Reia's main disadvantage is that its object model does not work like any other language in existence, unless you consider Erlang's approach an "object model". Objects, being a shared-nothing, individually garbage collected Erlang process, are much heavier (approximately 300 machine words at minimum) than objects in your typical object oriented language (where I hear some runtimes offer zero overhead objects, or something). Your typical "throw objects at the problem" programmer is going to build a system, and the more objects that are involved the more error prone it's going to become. Reia is a language which asks you to sit back for a second and ponder what can be modeled as possibly nested structures of lists, tuples, and maps instead of objects.

Reia does not allow cyclical call graphs, meaning that an object receiving a call cannot call another object earlier in the call graph. Instead, objects deeper in the call graph must interact with any previously called objects asynchronously. If your head is spinning now I don't blame you, and if you do understand what I'm talking about I cannot offer any solutions. Reia's call graphs must be acyclic, and I have no suggestions to potential Reia developers as to how to avoid this problem, besides being aware of the call flow and ensuring that all "back-calls" are done asynchronously. Cyclic call graphs result in a deadlock, one which can presently only be detected through timeouts, and remain a terrible pathological case. I really wish I could offer a better solution and I am eager if anyone can help me find a solution. This is far and away the biggest problem I have ever been faced with in Reia's object model and I am sad to say I do not have a good solution.

All that said, Scala's solution is so beautifully familiar! It works with the traditional OOP semantics of C++ which were carried over into Java, and this is what most OOP programmers are familiar with. I sometimes worry that the approach to OOP I am advocating in Reia will be rejected by developers who are familiar with the C++-inspired model, because Reia's approach is more complex and more confusing.

Furthermore, Scala's object model is not only familiar, it's extremely well-studied and well-optimized. The JVM provides immense capability to inline method calls, which means calls which span multiple objects can be condensed down to a single function call. This is because the Smalltalk-inspired illusion that these objects are receiving and sending messages is completely suspended, and objects are treated in C++-style as mere chunks of state, thus an inlined method call can act on many of them at once as if they were simple chunks of state. In Reia, all objects are concurrent, share no state, and can only communicate with messages. Inlining calls across objects is thoroughly impossible since sending messages in Reia is not some theoretical construct, it's what really happens and cannot simply be abstracted away into a function call which mutates the state of multiple objects. Each object is its own world and synchronizes with the outside by talking by sending other objects messages and waiting for their responses (or perhaps just sending messages to other objects then forgetting about it and moving on).

So why even bother?

Why even bother pursuing Reia's approach then, if it's more complex and slow? I think its theoretical purity offers many advantages. Synchronizing concurrency through the object model itself abstracts away a lot of complexity. Traditional object usage patterns in Reia (aside from cyclical call graphs) have traditional object behavior, but when necessary, objects can easily be made concurrent by using them asynchronously. Because of this, the programmer isn't burdened with deciding what parts of the system need to be sequential and what parts concurrent ahead of time. They don't need to rip out their obj.method() calls and replace them with WTFs!? when they need some part of the system they didn't initially anticipate to be concurrent. Programmers shouldn't even need to use actors directly unless they're implementing certain actor-specific behaviors, like FSMs (the 5% case Joe Armstrong was talking about).

Why build objects on actors?

Objects aren't something I really have a concrete, logical defense of, as opposed to a functional approach. To each their own is all I can say. Object oriented programming is something of a cargo cult... its enthusiasts give defenses which often apply equally to functional programming, especially functional programming with the actor model as in Erlang.

My belief is that if concurrent programming can be mostly abstracted to the level of objects themselves, a lot more people will be able to understand it. People understand objects and the separation of concerns which is supposedly implicit in their identities, but present thread-based approaches to concurrency just toss that out the window. The same problem is present in Scala: layering the actor model on top of an object system means you end up with a confusing upper layer which works kind of like objects, but not quite, and runs concurrently, while objects don't. When you want to change part of the system from an object into an actor, you have to rewrite it into actor syntax and change all your lovely dots into !?s?!

Building the object system on top of the actor model itself means objects become the concurrency primitive. There is no divorce between objects and actors. There is no divorce between objects, thread-based actors, and event-based actors as in Scala. In Reia objects are actors, speak a common protocol, and can handle the most common use cases.

When objects are actors, I think everything becomes a lot simpler.

Sunday, April 5, 2009

Twitter: a followup

My last post about Twitter unsurprisingly drew the responses of the super duper microblogging interconnected developers at Twitter.

I'm still confused as to why Starling ever came about in the first place, but their reasoning for the move to Scala is a lot clearer now. They want a statically typed language to manage a larger codebase, and a faster language so they can grow to handle more load. It would seem their entire system is both stateful and high throughput, for which they need a high performance disk logged message queue.

However, for the one stateless part of their system, the webapp, I guess they're going to stick with Ruby, and the good old Matz Ruby Interpreter. JRuby won't run their app for some reason.

That's at least what I digested from the Twitter employees. Thanks for your replies.

Friday, April 3, 2009

Twitter: blaming Ruby for their mistakes?

"It's a poor workman who blames his tools..."
With the Internet abuzz about Google being in talks with Twitter, it seems that Ruby has become the proverbial whipping boy for Twitter's scaling problems. Twitter developer Alex Payne is now preaching Scala is the new Ruby, and panning Ruby for its technical foibles:
One of the things that I’ve found throughout my career is the need to have long-lived processes. And Ruby, like many scripting languages, has trouble being an environment for long lived processes. But the JVM is very good at that, because it’s been optimized for that over the last ten years. So Scala provides a basis for writing long-lived servers, and that’s primarily what we use it for at Twitter right now.
I've certainly been bitten by Ruby's poor garbage collection. The de facto Ruby interpreter uses a primitive mark-sweep garbage collection algorithm which slowly "leaks" memory over time as the heap fragments. It sure would be nice if Ruby had all sorts of exotic pluggable GC options the way the JVM does. Twitter developer Robey Pointer opines: "With Scala we could still write this really high level code, but be on the JVM." Yes, it sure would be nice if you could use a high level language like Ruby on the JVM.

Except Ruby does run on the JVM with JRuby, and this is something mosts Rubyists I know are aware of. I've been following JRuby pretty closely for the past several months because I am particularly interested in using Ruby for "always on" devices, so I need the compacting garbage collection the JVM provides. JRuby doesn't perform as well as Scala, but it is a fully implemented, viable, and in most cases better performing implementation than the de facto interpreter.

I can only assume someone at Twitter knows about and has used JRuby. Why this doesn't enter into their technology selection process is beyond me. Thanks to JRuby, "the interpreter sucks!" is no longer a valid complaint against Ruby, but that doesn't seem to prevent Twitter, one of the foremost Ruby-using companies in the world, from trashing it. This is ironic considering complaints that "Ruby doesn't scale!" are almost intractably linked to Twitter's scaling problems, while other companies have managed huge Rails deployments without scaling problems. I do not envy the task Twitter has before them and at my job I certainly don't deal with the sheer volumes of data they do (although I do still deal with asynchronous processing of a lot of data, using Ruby), but it's my belief that Twitter's scaling problems have much more to do with the culture at Twitter than they do with Ruby as a language.

At the heart of this Ruby vs. Scala debacle at Twitter is their message queue. Rather than chosing one of the hundreds of message queues that are already available (including ones written in Ruby), Twitter seemed to succumb to NIH and wrote their own. The result was Starling, a message queue which talks the memcache protocol (never mind there's already a message queue that does that too).

Starling is quite possibly one of the slowest and most poorly designed message queues in existence. I work for a company which, among other things, does a lot of message queue-driven asynchronous background processing of data using Ruby processes. When we selected a message queue, we surveyed at least a dozen of them, one of which was Starling. We did some basic simulated load testing, seeing how the queue performed for increasing numbers of readers/writers versus increasing message volumes. Starling's performance was utterly abysmal. As we increased the number of readers/writers to Starling its performance started nearing rock bottom quickly.

As I perhaps somewhat self-aggrandizingly consider myself one of the most knowledgable people regarding I/O in the Ruby world, I decided to peek around the Starling source and see what I discovered. What I found was a half-assed and pathetically underperforming reinvention of EventMachine, an event-based networking framework for Ruby which is the Ruby answer to the Twisted framework from Python. EventMachine is built on an underlying C++ implementation, and while the API it exposes is rather ugly, it's quite fast. This gross oversight was not present in the other message queue available for Ruby, which benchmarked substantially faster than Starling. Eventually Starling would be forked as "Evented Starling" and this gross oversight would be corrected.

As someone who has contributed to the EventMachine project and written my own high performance Ruby event framework, this is my reaction to the design of Starling:


It's not as if it's particularly hard to write a message queue. For shits and grins I wrote my own in Erlang just to compare it to Starling. The result was more or less as full featured as Starling, but performed a few orders of magnitude better, and was 1/10th the size (150 lines of code as opposed to 1500). My queue doesn't perform nearly as well as mature, open source alternatives, but it was a fun exercise to gauge just how badly the Twitter people failed.

Starling was clearly the first attempt of its authors to write a high performance network server in Ruby, and they miserably failed. I've never seen an explanation from Twitter as to why they felt existing message queues were inadequate. However, it became painfully clear that Starling was woefully inadequate:
By mid-2008, one of these Ruby message queues completely crashed and developers needed two and a half hours to shove the dropped Tweets back through the system. When your game is micro-blogging, that's a lifetime.
Yes, with absolutely zero experience in writing high performance network servers in Ruby, Twitter's NIH led them to homebrew their own message queue. And surprise surprise, it failed miserably! What was Twitter's reaction? Did they start looking for a better, open source message queue system written by people who are actually competent to develop message queues? No, of course not, more NIH to the rescue:
Then, in his spare time, one developer ported the code to Scala. According to Payne, the Scala queue could process the same message backlog in 20 seconds.
Yes, clearly Ruby is the problem, and more NIH is the solution. The result was Kestrel, a new message queue written in Scala which nobody but Twitter uses. It performs a lot better than Starling, though! Just not as well as RabbitMQ, a queue so fast certain crazy people I know are streaming video through it in realtime.

I've never seen Twitter's rationale for writing their own message queue in the first place. Reading the list of requirements given in the Kestrel description, I'm completely confused as to why MemcacheQ does not meet their needs. If you're willing to drop the "use the memcache protocol" requirement there are dozens of queues which would seem to fit their needs, with better performance than Kestrel.

I'm uncertain as to what else Twitter is using Scala for besides its message queue. Given all their myopic and seemingly JRuby-unaware harping on using Ruby for background jobs:
"And that wall was not so much it code but in limitations in Ruby virtual machine. There's a lot of things that Ruby is great at, but long running processes? Particularly memory intensive ones? Not so much."
...I'm guessing they are in the process of ripping out all the background jobs and rewriting them in Scala. But the message queue remains the center point of their argument. The failures of the message queue are Twitter's own, not Ruby's.

I would be curious to hear what arguments, if any, Twitter had against JRuby, or other message queues. The public arguments I've seen, and the decision making process I'm inferring from it, seem like an incredibly inept one. This belies something I've heard about those who deal with the Twitter people: their scaling problems come not so much from Ruby but from bad design decisions.

Overall, I think they're missing the point of Ruby. Ruby shines as an automation and "glue code" language, providing you with a swiss army knife that lets you easily integrate many software components writen in whatever language you want into a single, cohesive application. Message queues are commoditized in the software world, and clients exist for virtually all of them in the Ruby world. AMQP, XMPP, Stomp, you name it and chances are Ruby speaks it. The EngineYard folks are doing wonderful things with Ruby and XMPP/AMQP in their Vertebra and Nanite projects. The Twitter folks are... off reinventing message queues in Ruby, then blaming Ruby when their implementation turns out to be slow.

In conclusion... is Ruby a bad language for writing message queues in? Yes, there are much better choices. Message queues are a particularly performance critical piece of software, which requires your language has excellent I/O capabilities and a decent strategy for concurrency. Ruby has neither of these, so it's no wonder Starling fails miserably. But message queues aren't something you should be writing yourself. This speaks much more to Twitter's culture of NIH than it does to Ruby as a language.

Is Ruby a bad language for writing long-running processes? Absolutely not. JRuby provides state-of-the-art garbage collection algorithms available in the JVM to the Ruby world. These are the exact same technologies that are available in Scala. JRuby addresses all of their concerns for long-running processes, but they don't bother to mention it and instead just point out the problems of the de facto Ruby interpreter.

I expect this debate is raging inside Twitter and we're only seeing the surface of it. My apologies to Twitter if there is actually a well thought out rationale for what they're doing, but if so, the public message (and software) you're giving to the world is entirely unconvincing.

Update: If you check the comments you'll see the Twitter folks have clarified their position, and I've created a new post in response. I think their arguments for Scala and Kestrel are certainly reasonable, and their position makes much more sense when it's based on Scala's strengths, not Ruby's weaknesses. Twitter is a great service that I use every day (I mean, I have a Twitter sidebar on my blog and all), despite its occasional stability problems. I wish them luck on their new Scala-based backend and hope they can get these stability problems licked.

Tuesday, March 24, 2009

Is Reia an object oriented language Joe Armstrong could actually like?

Short answer: no! Long answer: he might hate it less than the others.

Erlang creator Joe Armstrong is no fan of OOP. He says it sucks! Oof! As someone who is building an object oriented language on a platform pioneered by Joe, I feel I'm at least obligated to respond to his concerns. Due to its strong Erlang influences Reia's object system ends up working a lot different than most other OOP languages so in some ways this does address some of his concerns.

Let's look at them:

Objection 1: Data structure and functions should not be bound together

So first, a confession: Reia's "builtin" types (which I'll get to later) are guilty of this. So -1 on my Joe score.

However, Reia's objects are more than just data structures. This is certainly a complaint about OOP which holds for most languages: "object" is really a fancy word for "a big chunk of state" and sending an object a "message" is really some more high falloting verbeage for "making a function call". In this regard OOP ends up seeming pretty imperative: it's just a big fancy facade for calling functions which probe and mutate state.

Similar concerns were noted by Clojure author Rich Hickey in his article On State and Identity. He notes that in most languages objects don't have real identities beyond their state and that OOP at its core is very much an imperative affair.

In Reia, all objects are Erlang processes and operate concurrently. Each object has a life of it's own... it's not just some lifeless chunk of data sitting somewhere in RAM, completely at the mercy of whatever modifications some random thread may throw at it.

Calling a method is more than just a function call. It uses messages... not goofy metaphorical messages but real, first-class messages. Presently Reia's objects are built on gen_server, so invoking a method on an object performs an RPC on the remote object.

Self-ascribed OOP creator Alan Kay is practically obsessed with messages. He once wrote: "I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages (so messaging came at the very beginning -- it took a while to see how to do messaging in a programming language efficiently enough to be useful)." What he began with messages in Smalltalk would eventually be boiled down to function calls in most successor OOP languages (with C++ being perhaps the most infamous example).

Reia does not follow in this tradition, but maintains a very message-centric view of OOP which is more akin to what Alan Kay originally had in mind. I've also read that Alan Kay has claimed that not making objects concurrent was one of his greatest mistakes, although I don't have a specific citation on that one (if someone knows please leave me a comment).

Erlang vis-a-vis the actor model and its outgrowth from Smalltalk certainly embodies similar concepts.

Objection 2: Everything has to be an object

Well, this is an easy one to answer: everything doesn't have to be an object! I'm sure any Rubyist readers are groaning on that one. Pretty much every language I see coming out of other Rubyists follows this paradigm. Reia doesn't.

Reia has three fundamental types of entities: builtins, processes, and objects. Objects can be thought of as a type of process, but all objects respond to a common base set of messages. Reia's objects can thus address a particular concern of Rich Hickey's about actor-based systems:
"You can't observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol."
Reia implements an inheritance system, which means that you can group the protocols that you want a given set of objects to respond to together into superclasses. All objects will, at the very minimum, respond to the set of methods defined in the Object superclass.

You can certainly do the same in Erlang by placing common functionality into a "superclass" module. Reia just gives it first-class syntax and does some of the thunking a bit more automatically.

Reia also provides processes, which are for all intents and purposes identical to Erlang processes. For some reason many OO purists have messaged me asking if this is necessary and suggesting I rip it out. Reia does provide asynchronous features as part of its object system... it has (or rather, will have) equivalents for gen_server:cast and gen_server:reply. Given this, they argue, who needs processes?

I think these people don't understand that gen_server is not a one-size-fits-all solution. Since I'm building a language on top of Erlang it would be silly not to expose the full power of its messaging system and try to force everything into gen_server-shaped boxes. Reia goes the opposite route and provides processes which are virtually identical to Erlang. Reia even has the same "!" syntax for sending messages, and it works on objects as well as processes.

Reia's builtin types are made out of plain old immutable Erlang terms. They are not objects themselves. With a little bit less "throw objects at the problem" they can serve to represent all kinds of things. My hope would be programmers use builtins more and objects less than they would in a typical OOP language.

Objection 3: In an OOPL data type definitions are spread out all over the place

Guilty as charged... take another point off my Joe score. Next!

Objection 4: Objects have private state

Also guilty. However, I hope Reia gets some bonus points in this regard for the way it implements its private state. Behind the scenes Reia's representation of "instance variables" is a pure functional one. The compiled form has the "instance variable" state (represented as a dict) enter a method (now a function) as an argument. The state is versioned throughout the compiled form, so wherever there is potential for change it's bound to a new label. When the method completes the final state is returned.

Because this state is represented in a pure functional form behind the scenes it means Reia retains some of the benefits it brings in Erlang. For example: private state is transactional, as the object's true state isn't actually updated until method dispatch has completed, in the same way a gen_server's state isn't updated until one of the handle callbacks has returned with the new state.

It also means the scope of private state is limited. Any cases I've encountered (such as lambdas) where I cannot implement state modifications in a simple, pure functional form I've avoided it. Lambdas are able to access the private state (which they close over at the time they're declared) but they are not allowed to modify it.

This approach has been more complex than alternatives like just giving the process dictionary a first class syntax. I could've done that. However, compiling to a pure functional form really helped me see the corner cases where the state was trying to creep into strange scopes. The current implementation has precise semantics about state flow and works in a way not too far distant from how an Erlang programmer would actually write it.

What's the point?

I think that question is pretty much the sentiment of Joe's article. Is OO actually useful? Given certain technical drawbacks (i.e. side effects) introduced by hidden state it becomes more difficult to make the same statements about a Reia object that you could about an Erlang gen_server (and destructive assignment doesn't help things). Is there actually some value objects bring to the table?

The merits of objects aren't technical, in my opinion. I think they're a good model for certain types of people to conceptualize problems. Lots of people really like them. Joe tried it and said it felt "wrong" but for many it does feel "right" in some way.

Fortunately, for the people who think it feels "wrong" there's this great language called Erlang you can use...

What does Joe Armstrong actually think about Reia?

I saw Joe Armstrong was asked about Reia in an Erlang Inside interview, here's what he had to say:
"I haven’t tried it - I don’t like mutable anything"
D'oh! Oh well. I'm afraid you won't be seeing a single-assignment Reia. I'll save that for another post, though.

Monday, March 9, 2009

I'm speaking at Erlang Factory conference in April

Anyone who visits my blog (as opposed to just reading it via the RSS feed) has probably noticed the badge in my sidebar, but I figured that for the rest of you I should probably announce it...

I'll be speaking at the Erlang Factory conference in San Francisco (April 27th to May 1st). As you may have guessed my talk will cover Reia, but as I don't believe Reia will be particularly interesting to existing Erlang enthusaists, I'm going to start things off with a crash course in building languages on top of the Erlang VM. I hope to develop the content of the presentation on my blog over the next several weeks. You'll get the meat and potatoes here spread out over several posts, and a more in-depth, polished version if you actually attend the conference.

I hope to see you there!

Sunday, March 8, 2009

Reia: no longer indentation-sensitive

Tonight I merged the "indentless" branch of Reia into master. This means that Reia is no longer indentation sensitive. Ironically enough, some of the advice driving me to abandon indentation-based syntax comes from Python creator Guido van Rossum:
"The unspoken, right brain constraint here is that the complexity introduced by a solution to a design problem must be somehow proportional to the problem's importance"
I'm ready to move on to bigger and better things than just twiddling around with my language's grammar, trying to find ways of using indentation to solve minor syntactic annoyances. Any such attempts translate into major headaches for anyone trying to invent a template language.

I was amazed at how quickly I was able to rip out indentation sensitivity and how much it simplified my grammar. There were some particularly interesting commits along the way such as when I ripped indentation sensitivity out of the lexer, replacing it with a single new keyword.

All that said, there are unfortunately now numerous code fragments based on the indentation-sensitive syntax posted throughout the Reia Wiki. I'll be working to clean these up, but if you see one and want to help out, feel free to fix them (although you will have to register with the wiki to make edits). The new syntax looks an awful lot like Ruby, so if you make your best guess adding end keywords where it'd make sense in Ruby, you'll probably be right.

Tuesday, March 3, 2009

Indentation sensitivity: a post-mortem?

I've been designing my language Reia with an indentation-sensitive syntax like Python... until now.

Things went fairly smoothly at first. I read a short description of how Python's lexer works and in no time banged out an equivalent one with leex. Inspired by this, I proceeded constructing my grammar, largely defaulting to Ruby or Erlang in cases I was uncertain of and hardly ever looking at the Python grammar. I wanted Reia to have a purely expression-based syntax where everything returns a value, as opposed to a Pythonic grammar with both expressions and statements which return no value. This would prove to be my undoing.

At the time I didn't understand the beautiful set of compromises Guido van Rossum managed to make when designing the Python grammar. I became curious as to why my language had multi-line lambdas and Python did not. The answer to that question would lay in a problem with my grammar I would soon discover and could not come up with a good solution to.

I first discovered the problem with Reia's multi-line blocks. I had used them in the past in small 1-2 liner functions without issue, and all had seemed well. Then some time later I tried to add more to a function after a multi-line block. I ran into an unexpected syntax error... I couldn't put any additional expressions after a multi-line block.

Blocks are more or less lambdas, so it would seem as if I had hit the infamous multi-line lambda problem. I compared the output of my lexer to Python's and found the two to be identical. The answer would lie in the grammar. Looking at the tokens there was no statement separator between multi-line statements in Python.

From Python's grammar.txt:

statement ::= stmt_list NEWLINE | compound_stmt

What does this line mean? All ordinary statements (i.e. stmt_lists), such as expressions, and anything that fits on a single line must be terminated by a NEWLINE. The "compound statements", ones which span multiple lines, were self-contained. There was no statement separator between them and the next statement. In the Zen of Python they just are, with no explicitly delineated boundaries.

This underlies the magic of Python's "indent consistently and your code will be valid" syntax. I could not duplicate this using a Python-style lexer. The best I could come up with was modifying the lexer to accept a few different cases of empty lines as being syntactically significant, which meant indentation sensitivity would be more obtrusive.

Reading Guido van Rossum's rationale against multi-line lambdas in Python, Language Design Is Not Just Solving Puzzles, he lays out his case: any solution that embeds an indentation-based block in the middle of an expression is unacceptable, and an alternate way of grouping syntax besides indentation is unacceptable. I certainly agree with the latter... it's silly to have some expression/statement groupings indentation based and some grouped by enclosing tokens. The former was a bit more subtle... I didn't realize what separating out statements from expressions afforded in the grammar (specifically in regard to the self-contained "compound statements").

I started noticing other mismatches between indentation sensitivity and an expression-based grammar which aren't present in Python. Implicit line joning in Python removes tokens which are useless in the given context (which much be inside an expression) since expressions can't contain statements. In Reia's previous form implicit line joining would rule out a whole host of otherwise gramatically valid constructions embedding expressions with indent blocks inside other expressions. But without implicit line joining, the programmer is very limited in terms of splitting up function arguments, lists, dicts, and tuples across multiple lines.

I also started to notice all sorts of strange syntactic oddities which emerge in a grammar which embeds indentation blocks in statements. This is exactly the kind of ugliness I believe Guido sought to avoid by disallowing indent blocks within expressions.

Unhappy with that direction, I looked to alternatives which preserve indentation sensitivity. The Haskell approach to indentation seemed interesting, and I found a expression-based Python-like language with Haskell-like indentation: Logix. It turned out to be by Tom Locke, who would go on to author Hobo, the web application builder for Rails. (Edit: Tom Locke mailed me back saying he doesn't like indentation-based syntaxes any more and now prefers Ruby)

This seemed like a potential direction to go in, but at this point, I was sick of indentation sensitivity. My conclusion is it works very well in Python, and if you're content to separate your grammar into statements and expressions and only have indent blocks in statements, it's a great way to go with a language.

I simply wasn't content to abandon an expression-based grammar. I needed multi-line expressions, specifically for Ruby-style blocks, if nothing else. In a totally unscientific poll of Rubyists I put together, the #1 feature with a 75% approval rating was blocks. I couldn't have multi-line expressions in an indentation sensitive language where I desperately deisred them for blocks. So something had to give...

Last night I pushed a new "indentless" branch for Reia. Without indentation sensitivity the Python influence is starting to wane and it's beginning to look a lot like Ruby. I spent tonight cleaning it up and making sure it can build itself from scratch, as well as converting all the code examples and remaining parts of the standard library.

This is quite likely the direction Reia is going to go. So far the response to abandoning indentation sensitivity has been overwhelmingly positive. It will certainly be a boon for people making template languages as now it will be quite easy to put together something like ERb or a template language which embeds Reia syntax with little worry.

Short of someone making an impassioned plea for preserving indentation sensitivity, it will soon be gone from Reia.

What do Rubyists like about Ruby?

I put together a totally unscientific gimpy little web poll and posted it to the ruby-talk mailing list, asking Rubyists what their favorite Ruby features are. I received a little over 60 responses. Here are the top 10 (at the time this was posted):
  1. Blocks (71%)
  2. Lambdas / Procs (57%)
  3. Mix-ins / Modules (55%)
  4. Open classes / monkeypatching (53%)
  5. Duck-typing (53%)
  6. First-class regexps (49%)
  7. Metaprogramming (49%)
  8. Syntactic sugar e.g. val ||= x; array <<>
  9. Optional parens / English-like readability (44%)
  10. Object#method_missing (42%)
There's no real surprises here. Rubyists love their blocks, but perhaps even more than that, Rubyists love closures. Talking to some other people about lambdas / procs being so high, the general sentiment was that they were a feature which was used somewhat infrequently but was very handy when needed.

The mixin approach to inheritance is also quite popular. This lies alongside open classes and monkeypatching. Rubyists really seem to enjoy the ability to manipulate code inside the runtime environment, and we also see metaprogramming not too far down the list.

First class regexps have always been one of my favorites and I was glad to see they fared well. Beyond that, the type system, syntactic sugar, and method_missing were also popular.

I've left the poll open if you'd still like to vote:

  • Creating Domain Specific Languages
  • Ternary expressions e.g. foo ? bar : baz
  • Object#class / Object#is_a?
  • Object#method(s) / Module#instance_method(s)
  • Reflection
  • Lambdas / procs
  • Mix-ins / Modules
  • Object#instance_eval / Object#instance_exec
  • Redefining the behavior of core types
  • Standard library (and its use of blocks)
  • Instance-specific behavior / metaclasses
  • First class regexps
  • Optional parens / English-like readability
  • Definable MyObject#method=
  • Module#module_eval / Module#class_eval
  • Use of operators by core types (e.g. Array#*)
  • Object#method_missing
  • Operator overloading (in novel ways)
  • RubyGems
  • Open classes / monkeypatching
  • Syntactic sugar e.g. val ||= x; array << val
  • Code in class body (and metaprogramming with it)
  • Duck typing
  • Enumerators (Ruby 1.8.7+/Ruby 1.9)
  • Embedding code in strings with "#{mycode}"
  • Definable MyObject#[] and MyObject#[]=
  • Enumerable
  • Blocks
  • Singleton classes
  • Module#define_method
  • Splats e.g. method(*args)
  • ObjectSpace
  • Metaprogramming
  • Object#instance_variable_get / Object#instance_variable_set
  • Continuations e.g. callcc
  • MyObject#method? and MyObject#method! idiom