[ Go back to normal view ]

BW2 :: the bitwise supplement :: http://www.bitwisemag.com/2

DLR - Build Your Own Language (without tears)
A developer’s guide to the Dynamic Language Runtime

3 March 2008

by Dermot Hogan

Dermot Hogan looks at what’s required to build your very own computer language using two new – and remarkable – tools: Microsoft’s Dynamic Language Runtime and ANTLR3 by Terrence Parr from the University of San Francisco.



- See also: Part Two of this series

Ever had trouble learning a computer language in the past? Well why not dump the C++ manuals and start writing your very own language to do exactly what you want? If this sounds impossibly difficult, or only suitable for Computer Science professors – it was until very recently. But two new developments have made language writing (as opposed to reading) possible for mere mortals. The first is ANTLR3- a totally new version of the highly popular ANTLR2 parser system (http://www.antlr.org/). And the second is Microsoft’s revolutionary Dynamic Language Runtime – the DLR (see: http://www.codeplex.com/IronPython).

So what’s this DLR thing?

It’s not that easy to find out what the DLR is. There’s not much information about it around and what there is patchy (at best). Initially, I thought that Microsoft had added some new instructions to the existing Common Language Runtime – the CLR – to help with dynamic languages. That’s not the case – the DLR does not alter the CLR in any way. In fact, the DLR is just another program that sits on top of the CLR that runs your dynamic language. It’s like a dynamic language interpreter. But it’s a lot cleverer and a lot faster than the usual run-of-the-mill interpreter. It’s really half way between a compiler and an interpreter with some really neat features built in to support dynamic languages.

Incidentally, there’s been some debate about whether it’s better to support dynamic languages within the CLR or Java’s equivalent – the JVM. Sun has an experimental project the ‘Da Vinci Machine’ that is implementing new instructions on the JVM for just this purpose. However, there’s a big downside to this approach. JVM and CLR instructions are only added after a long, long period of research and testing. And rightly so – these virtual machines have to work – and work flawlessly. And dynamic languages are a very small part of programming. My feeling is that we’ll be waiting some time before dynamic language constructs appear in either the CLR or the JVM. Microsoft’s layering of the DLR on top of the CLR rather than embedding it seems a better and more pragmatic approach.

What distinguishes the DLR from an interpreter or compiler is the way it handles the ‘dynamic’ in a dynamic language. Basically, a dynamic language is one where the type of an object can be changed at runtime. For example, you can’t do this in Java or C#:

i = 1;
i = "hello";

Of course, there’s quite a bit more to dynamic languages than this: there’s a good description here but those two lines of code sum up what’s going on in a dynamic language. Essentially, the behaviour of ‘i’ above changes from behaving like an integer to behaving like a string. This isn’t actually a new idea. Visual Basic had ‘variants’ before Microsoft butchered it into VB .NET and the granddaddy of them all, Lisp, is now 50 years old. The thing is, variable typing is now considered to be ‘a good thing’. Times change.

The DLR keeps a set of rules on how to handle addition, say, and selects the correct rule to use depending on the type. The rules are CLR ‘delegates’ – essentially function pointers – which the DLR invokes as necessary. So, if the type of the variable has changed a new rule will be invoked, like this:

i = 1;
j = 2;
k = i + j;       // integer addition rule invoked
i = "hello";
j = "world";
k = i + j;      // string addition rule invoked

Now, the question is – how do you tell the DLR about these rules?

An ANTLR Detour

To use the DLR, you really need to create an AST. An AST – Abstract Syntax Tree – is a symbolic representation of your program. There are two ways to produce an AST – the hard way and using ANTLR. The hard way uses the traditional yacc/flex parser and lexer. Technically, yacc is a LALR parser which generates state tables which are then used to built the AST. All I will say on the matter is that there are more interesting ways to spend the few years of life we have than debugging yacc state tables (I write with feeling).

ANTLR on the other hand is a ‘recursive decent’ parser, LL(k) in the jargon. Practically, ANTLR produces readable (mostly) code which you can understand and debug. Here, I’ll be looking at ANTLR3, the latest iteration of Terrence Parr’s obsession. ANTLR3 is free, written in Java but produces ‘target’ code for Java, C#, C++, etc. Currently, the release is 3.0.1, but we’ll need the next release, 3.1 (still in beta – but due ‘real soon now’), to really link into the DLR properly.

Generating the AST for the DLR using ANTLR involves three steps. First is the ‘lexer’. A lexer defines the symbols of a language. For example, a C# lexer will distinguish between an integer ‘123’ and a keyword, ‘void’. The second step is the parser. The parser pulls symbols for the lexer and builds ‘statements’ (sentences in a language like English). The parser will distinguish between correct statements and rubbish. For example, the parser will allow statements like: void abc() {} but will complain about abc void(){} even though the symbols generated by the lexer are identical.

The third stage is the ‘tree grammar generator’. The ANTLR parser will produce an AST, but the tree grammar tells ANTLR how to ‘walk’ the AST and generate DLR code.

This might sound a bit long winded, but in fact it isn’t. ANTLR3 syntax is mostly pretty straightforward and with a couple of examples, it’s very easy to get started and use.

Linking into the DLR

Once you’ve built the ANTLR lexer-parser-generator, you need to create objects that the DLR will use. The DLR uses these objects in a two pass operation. First, the object is created as the DLR walks the ANTLR generated AST. This is a standard object-oriented create operation like this for a binary operator in the simple example language I’ll start discussing next month:

public MyLBinary(SourceSpan span, int op, MyLExpression left, MyLExpression right)
 : base(span) {
 _op = op;
 _left = left;
 _right = right;
}

The next thing is the ‘rule’ that is to be invoked when the language construct is encountered:

protected internal override Expression Generate() {
 Expression left = _left.Generate();
 Expression right = _right.Generate();

 Operators op;
 switch (_op) {
   case TestLexerLexer.PLUS:
     op = Operators.Add;
     break;
   case TestLexerLexer.MINUS:
     op = Operators.Subtract;
     break;
   case TestLexerLexer.MULTIPLY:
     op = Operators.Multiply;
     break;
   case TestLexerLexer.DIVIDE:
     op = Operators.Divide;
     break;
 default:
   throw new System.InvalidOperationException();
 }
 return Ast.Action.Operator(op, typeof(object), left, right);
}

Here, the ‘Generate’ method defines what the DLR is to do for a binary operation. Now the DLR doesn’t call your Generate method every time it encounters a binary addition, say. Instead, it generates CLR code from your rule and caches that in the DLR rule cache. So, in general, you rule is only called once, compiled by the DLR into bytecode and that bytecode is what runs. This is in contrast to a normal interpreter where the above rule would be called every time a binary operation is encountered.

Summing up

So in essence, we do the following:

- Write a lexer-parser-generator in ANTLR3 that defines the AST structure that will be ‘walked’ by the DLR.

- The DLR then initially walks that structure creating objects that correspond to operations in your dynamic language.

- Then as your language is executed, the DLR calls your objects to obtain the rules – the code that must be run – when it encounters a given language construct. It only calls this rule once and compiles code that handles this rule.

There’s quite a bit more to the DLR than this, and over the next few articles, I’ll go into the inner workings of the DLR in more detail. But from the point of view of implementing a simple language, the overview I’ve outlined above is adequate.

Next month, I’ll go through a very basic example of how to produce a working language using ANTLR3 and the DLR.


Dermot Hogan is the Chief Architect of SapphireSteel Software’s Ruby In Steel IDE for Visual Studio - including an IDE (in development) for Microsoft’s DLR-based language, IronRuby and a new programming language, Sapphire, which will run on the DLR.