Tuesday, March 24, 2009

Is Reia an object oriented language Joe Armstrong could actually like?

Short answer: no! Long answer: he might hate it less than the others.

Erlang creator Joe Armstrong is no fan of OOP. He says it sucks! Oof! As someone who is building an object oriented language on a platform pioneered by Joe, I feel I'm at least obligated to respond to his concerns. Due to its strong Erlang influences Reia's object system ends up working a lot different than most other OOP languages so in some ways this does address some of his concerns.

Let's look at them:

Objection 1: Data structure and functions should not be bound together

So first, a confession: Reia's "builtin" types (which I'll get to later) are guilty of this. So -1 on my Joe score.

However, Reia's objects are more than just data structures. This is certainly a complaint about OOP which holds for most languages: "object" is really a fancy word for "a big chunk of state" and sending an object a "message" is really some more high falloting verbeage for "making a function call". In this regard OOP ends up seeming pretty imperative: it's just a big fancy facade for calling functions which probe and mutate state.

Similar concerns were noted by Clojure author Rich Hickey in his article On State and Identity. He notes that in most languages objects don't have real identities beyond their state and that OOP at its core is very much an imperative affair.

In Reia, all objects are Erlang processes and operate concurrently. Each object has a life of it's own... it's not just some lifeless chunk of data sitting somewhere in RAM, completely at the mercy of whatever modifications some random thread may throw at it.

Calling a method is more than just a function call. It uses messages... not goofy metaphorical messages but real, first-class messages. Presently Reia's objects are built on gen_server, so invoking a method on an object performs an RPC on the remote object.

Self-ascribed OOP creator Alan Kay is practically obsessed with messages. He once wrote: "I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages (so messaging came at the very beginning -- it took a while to see how to do messaging in a programming language efficiently enough to be useful)." What he began with messages in Smalltalk would eventually be boiled down to function calls in most successor OOP languages (with C++ being perhaps the most infamous example).

Reia does not follow in this tradition, but maintains a very message-centric view of OOP which is more akin to what Alan Kay originally had in mind. I've also read that Alan Kay has claimed that not making objects concurrent was one of his greatest mistakes, although I don't have a specific citation on that one (if someone knows please leave me a comment).

Erlang vis-a-vis the actor model and its outgrowth from Smalltalk certainly embodies similar concepts.

Objection 2: Everything has to be an object

Well, this is an easy one to answer: everything doesn't have to be an object! I'm sure any Rubyist readers are groaning on that one. Pretty much every language I see coming out of other Rubyists follows this paradigm. Reia doesn't.

Reia has three fundamental types of entities: builtins, processes, and objects. Objects can be thought of as a type of process, but all objects respond to a common base set of messages. Reia's objects can thus address a particular concern of Rich Hickey's about actor-based systems:
"You can't observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol."
Reia implements an inheritance system, which means that you can group the protocols that you want a given set of objects to respond to together into superclasses. All objects will, at the very minimum, respond to the set of methods defined in the Object superclass.

You can certainly do the same in Erlang by placing common functionality into a "superclass" module. Reia just gives it first-class syntax and does some of the thunking a bit more automatically.

Reia also provides processes, which are for all intents and purposes identical to Erlang processes. For some reason many OO purists have messaged me asking if this is necessary and suggesting I rip it out. Reia does provide asynchronous features as part of its object system... it has (or rather, will have) equivalents for gen_server:cast and gen_server:reply. Given this, they argue, who needs processes?

I think these people don't understand that gen_server is not a one-size-fits-all solution. Since I'm building a language on top of Erlang it would be silly not to expose the full power of its messaging system and try to force everything into gen_server-shaped boxes. Reia goes the opposite route and provides processes which are virtually identical to Erlang. Reia even has the same "!" syntax for sending messages, and it works on objects as well as processes.

Reia's builtin types are made out of plain old immutable Erlang terms. They are not objects themselves. With a little bit less "throw objects at the problem" they can serve to represent all kinds of things. My hope would be programmers use builtins more and objects less than they would in a typical OOP language.

Objection 3: In an OOPL data type definitions are spread out all over the place

Guilty as charged... take another point off my Joe score. Next!

Objection 4: Objects have private state

Also guilty. However, I hope Reia gets some bonus points in this regard for the way it implements its private state. Behind the scenes Reia's representation of "instance variables" is a pure functional one. The compiled form has the "instance variable" state (represented as a dict) enter a method (now a function) as an argument. The state is versioned throughout the compiled form, so wherever there is potential for change it's bound to a new label. When the method completes the final state is returned.

Because this state is represented in a pure functional form behind the scenes it means Reia retains some of the benefits it brings in Erlang. For example: private state is transactional, as the object's true state isn't actually updated until method dispatch has completed, in the same way a gen_server's state isn't updated until one of the handle callbacks has returned with the new state.

It also means the scope of private state is limited. Any cases I've encountered (such as lambdas) where I cannot implement state modifications in a simple, pure functional form I've avoided it. Lambdas are able to access the private state (which they close over at the time they're declared) but they are not allowed to modify it.

This approach has been more complex than alternatives like just giving the process dictionary a first class syntax. I could've done that. However, compiling to a pure functional form really helped me see the corner cases where the state was trying to creep into strange scopes. The current implementation has precise semantics about state flow and works in a way not too far distant from how an Erlang programmer would actually write it.

What's the point?

I think that question is pretty much the sentiment of Joe's article. Is OO actually useful? Given certain technical drawbacks (i.e. side effects) introduced by hidden state it becomes more difficult to make the same statements about a Reia object that you could about an Erlang gen_server (and destructive assignment doesn't help things). Is there actually some value objects bring to the table?

The merits of objects aren't technical, in my opinion. I think they're a good model for certain types of people to conceptualize problems. Lots of people really like them. Joe tried it and said it felt "wrong" but for many it does feel "right" in some way.

Fortunately, for the people who think it feels "wrong" there's this great language called Erlang you can use...

What does Joe Armstrong actually think about Reia?

I saw Joe Armstrong was asked about Reia in an Erlang Inside interview, here's what he had to say:
"I haven’t tried it - I don’t like mutable anything"
D'oh! Oh well. I'm afraid you won't be seeing a single-assignment Reia. I'll save that for another post, though.

Monday, March 9, 2009

I'm speaking at Erlang Factory conference in April

Anyone who visits my blog (as opposed to just reading it via the RSS feed) has probably noticed the badge in my sidebar, but I figured that for the rest of you I should probably announce it...

I'll be speaking at the Erlang Factory conference in San Francisco (April 27th to May 1st). As you may have guessed my talk will cover Reia, but as I don't believe Reia will be particularly interesting to existing Erlang enthusaists, I'm going to start things off with a crash course in building languages on top of the Erlang VM. I hope to develop the content of the presentation on my blog over the next several weeks. You'll get the meat and potatoes here spread out over several posts, and a more in-depth, polished version if you actually attend the conference.

I hope to see you there!

Sunday, March 8, 2009

Reia: no longer indentation-sensitive

Tonight I merged the "indentless" branch of Reia into master. This means that Reia is no longer indentation sensitive. Ironically enough, some of the advice driving me to abandon indentation-based syntax comes from Python creator Guido van Rossum:
"The unspoken, right brain constraint here is that the complexity introduced by a solution to a design problem must be somehow proportional to the problem's importance"
I'm ready to move on to bigger and better things than just twiddling around with my language's grammar, trying to find ways of using indentation to solve minor syntactic annoyances. Any such attempts translate into major headaches for anyone trying to invent a template language.

I was amazed at how quickly I was able to rip out indentation sensitivity and how much it simplified my grammar. There were some particularly interesting commits along the way such as when I ripped indentation sensitivity out of the lexer, replacing it with a single new keyword.

All that said, there are unfortunately now numerous code fragments based on the indentation-sensitive syntax posted throughout the Reia Wiki. I'll be working to clean these up, but if you see one and want to help out, feel free to fix them (although you will have to register with the wiki to make edits). The new syntax looks an awful lot like Ruby, so if you make your best guess adding end keywords where it'd make sense in Ruby, you'll probably be right.

Tuesday, March 3, 2009

Indentation sensitivity: a post-mortem?

I've been designing my language Reia with an indentation-sensitive syntax like Python... until now.

Things went fairly smoothly at first. I read a short description of how Python's lexer works and in no time banged out an equivalent one with leex. Inspired by this, I proceeded constructing my grammar, largely defaulting to Ruby or Erlang in cases I was uncertain of and hardly ever looking at the Python grammar. I wanted Reia to have a purely expression-based syntax where everything returns a value, as opposed to a Pythonic grammar with both expressions and statements which return no value. This would prove to be my undoing.

At the time I didn't understand the beautiful set of compromises Guido van Rossum managed to make when designing the Python grammar. I became curious as to why my language had multi-line lambdas and Python did not. The answer to that question would lay in a problem with my grammar I would soon discover and could not come up with a good solution to.

I first discovered the problem with Reia's multi-line blocks. I had used them in the past in small 1-2 liner functions without issue, and all had seemed well. Then some time later I tried to add more to a function after a multi-line block. I ran into an unexpected syntax error... I couldn't put any additional expressions after a multi-line block.

Blocks are more or less lambdas, so it would seem as if I had hit the infamous multi-line lambda problem. I compared the output of my lexer to Python's and found the two to be identical. The answer would lie in the grammar. Looking at the tokens there was no statement separator between multi-line statements in Python.

From Python's grammar.txt:

statement ::= stmt_list NEWLINE | compound_stmt

What does this line mean? All ordinary statements (i.e. stmt_lists), such as expressions, and anything that fits on a single line must be terminated by a NEWLINE. The "compound statements", ones which span multiple lines, were self-contained. There was no statement separator between them and the next statement. In the Zen of Python they just are, with no explicitly delineated boundaries.

This underlies the magic of Python's "indent consistently and your code will be valid" syntax. I could not duplicate this using a Python-style lexer. The best I could come up with was modifying the lexer to accept a few different cases of empty lines as being syntactically significant, which meant indentation sensitivity would be more obtrusive.

Reading Guido van Rossum's rationale against multi-line lambdas in Python, Language Design Is Not Just Solving Puzzles, he lays out his case: any solution that embeds an indentation-based block in the middle of an expression is unacceptable, and an alternate way of grouping syntax besides indentation is unacceptable. I certainly agree with the latter... it's silly to have some expression/statement groupings indentation based and some grouped by enclosing tokens. The former was a bit more subtle... I didn't realize what separating out statements from expressions afforded in the grammar (specifically in regard to the self-contained "compound statements").

I started noticing other mismatches between indentation sensitivity and an expression-based grammar which aren't present in Python. Implicit line joning in Python removes tokens which are useless in the given context (which much be inside an expression) since expressions can't contain statements. In Reia's previous form implicit line joining would rule out a whole host of otherwise gramatically valid constructions embedding expressions with indent blocks inside other expressions. But without implicit line joining, the programmer is very limited in terms of splitting up function arguments, lists, dicts, and tuples across multiple lines.

I also started to notice all sorts of strange syntactic oddities which emerge in a grammar which embeds indentation blocks in statements. This is exactly the kind of ugliness I believe Guido sought to avoid by disallowing indent blocks within expressions.

Unhappy with that direction, I looked to alternatives which preserve indentation sensitivity. The Haskell approach to indentation seemed interesting, and I found a expression-based Python-like language with Haskell-like indentation: Logix. It turned out to be by Tom Locke, who would go on to author Hobo, the web application builder for Rails. (Edit: Tom Locke mailed me back saying he doesn't like indentation-based syntaxes any more and now prefers Ruby)

This seemed like a potential direction to go in, but at this point, I was sick of indentation sensitivity. My conclusion is it works very well in Python, and if you're content to separate your grammar into statements and expressions and only have indent blocks in statements, it's a great way to go with a language.

I simply wasn't content to abandon an expression-based grammar. I needed multi-line expressions, specifically for Ruby-style blocks, if nothing else. In a totally unscientific poll of Rubyists I put together, the #1 feature with a 75% approval rating was blocks. I couldn't have multi-line expressions in an indentation sensitive language where I desperately deisred them for blocks. So something had to give...

Last night I pushed a new "indentless" branch for Reia. Without indentation sensitivity the Python influence is starting to wane and it's beginning to look a lot like Ruby. I spent tonight cleaning it up and making sure it can build itself from scratch, as well as converting all the code examples and remaining parts of the standard library.

This is quite likely the direction Reia is going to go. So far the response to abandoning indentation sensitivity has been overwhelmingly positive. It will certainly be a boon for people making template languages as now it will be quite easy to put together something like ERb or a template language which embeds Reia syntax with little worry.

Short of someone making an impassioned plea for preserving indentation sensitivity, it will soon be gone from Reia.

What do Rubyists like about Ruby?

I put together a totally unscientific gimpy little web poll and posted it to the ruby-talk mailing list, asking Rubyists what their favorite Ruby features are. I received a little over 60 responses. Here are the top 10 (at the time this was posted):
  1. Blocks (71%)
  2. Lambdas / Procs (57%)
  3. Mix-ins / Modules (55%)
  4. Open classes / monkeypatching (53%)
  5. Duck-typing (53%)
  6. First-class regexps (49%)
  7. Metaprogramming (49%)
  8. Syntactic sugar e.g. val ||= x; array <<>
  9. Optional parens / English-like readability (44%)
  10. Object#method_missing (42%)
There's no real surprises here. Rubyists love their blocks, but perhaps even more than that, Rubyists love closures. Talking to some other people about lambdas / procs being so high, the general sentiment was that they were a feature which was used somewhat infrequently but was very handy when needed.

The mixin approach to inheritance is also quite popular. This lies alongside open classes and monkeypatching. Rubyists really seem to enjoy the ability to manipulate code inside the runtime environment, and we also see metaprogramming not too far down the list.

First class regexps have always been one of my favorites and I was glad to see they fared well. Beyond that, the type system, syntactic sugar, and method_missing were also popular.

I've left the poll open if you'd still like to vote:

  • Creating Domain Specific Languages
  • Ternary expressions e.g. foo ? bar : baz
  • Object#class / Object#is_a?
  • Object#method(s) / Module#instance_method(s)
  • Reflection
  • Lambdas / procs
  • Mix-ins / Modules
  • Object#instance_eval / Object#instance_exec
  • Redefining the behavior of core types
  • Standard library (and its use of blocks)
  • Instance-specific behavior / metaclasses
  • First class regexps
  • Optional parens / English-like readability
  • Definable MyObject#method=
  • Module#module_eval / Module#class_eval
  • Use of operators by core types (e.g. Array#*)
  • Object#method_missing
  • Operator overloading (in novel ways)
  • RubyGems
  • Open classes / monkeypatching
  • Syntactic sugar e.g. val ||= x; array << val
  • Code in class body (and metaprogramming with it)
  • Duck typing
  • Enumerators (Ruby 1.8.7+/Ruby 1.9)
  • Embedding code in strings with "#{mycode}"
  • Definable MyObject#[] and MyObject#[]=
  • Enumerable
  • Blocks
  • Singleton classes
  • Module#define_method
  • Splats e.g. method(*args)
  • ObjectSpace
  • Metaprogramming
  • Object#instance_variable_get / Object#instance_variable_set
  • Continuations e.g. callcc
  • MyObject#method? and MyObject#method! idiom