Bitwise Magazine :: Visual Studio 2005 Integration / Extensibility

Home

Archives

About us...

Advertising

Contacts

Site Map

VISUAL STUDIO 2005 INTEGRATION

Colouring code can help you make sense of a complicated structure. With Visual Studio Extensibility (VSIP), you can add your own colourisation to a language. Dermot Hogan explains how...

In this series, Dermot explains how to integrate a programming language into Visual Studio 2005. Part Two: Colour Coding (see also: Part One and Part Three)

Visual Studio provides two ways of adding colour to your code. One uses the traditional COM interfaces and you would probably use C++. The other is via the Managed Package Framework (MPF). While they both use the same fundamental Visual Studio interface, you’ll typically find that using the COM interface directly will lead to one approach, while using the MPF will can lead to another.

Microsoft does not include any parsing tools with Visual Studio. Nor does it recommend any - you either have to write your own or use external tools. The most widely available tools (and the ones that have been around the longest) are Lex and Yacc, deriving initially from AT&T’s Unix in the 1980’s. In their more modern forms, these come as Flex and Bison respectively. Flex and Bison are available from several places on the web (try here: http://dinosaur.compilertools.net) and are free, though they do come with some GNU restrictions. They are essentially C or C++ tools and when given an input ‘grammar’ produce C or C++ output files. You then compile the files in the usual way and incorporate them into Visual Studio. Visual Studio comes with a package – the Babel package – that has hooks into the code produced by Bison/Flex, and using Babel you can wire up a simple language colouriser quite quickly.

The other alternative, via the MPF system, means that you have to find a C# variant of Bison and Flex. I have found a couple, but the results were not really satisfactory and, after some experimentation and a dead end or so, I abandoned this line. There is however a much better way to work with the MPF and that’s to use a parsing system called Antlr (www.antlr.org). There are two things to consider when using Antlr. The first is that the MPF lexer is more complicated to get going. The other ‘problem’ is that Antlr is a ‘recursive descent’ LL(k) parser – quite different from the more common Yacc/Bison LALR(1) parser. In my view, this isn’t a real problem at all, as there are many advantages to using an LL(k) parser over a LALR one. But be aware of the difference when you start out: it’s not too easy to change direction once you are half way in to building your new Visual Studio Package.

The track I’ll look at here is the simpler, Bison/Flex route and the example I’ll use to illustrate the techniques is an assembler. Actually, it’s a Microchip assembler for the dsPIC30 digital signal processors. Microchip dsPIC programming is a whole different world from Visual Studio programming, and you might wonder how the two are related.

Fundamentally, I didn’t (and still don’t) like Microchip’s proprietary IDE (MPLAB). I wanted a different way to program my dsPIC chips. One part of this was to integrate Microchip’s GNU based C compiler into Visual Studio; this is really pretty simple to do. Another part was to host the dsPIC assembler in Visual Studio (this is part I’ll cover here). The final bit is to program the dsPIC chip from Visual Studio. That’s not so easy! But the end result is that I have a nice Visual Studio based IDE for developing dsPIC devices. And I don’t intend to go back to Microchip’s MPLAB any time soon.

Flexing Your Muscles

Most parsing systems spilt the job of making sense of an input file into two parts; first the ‘lexer’ (here Flex) reads the raw input file and spits out a stream of ‘tokens’. The real parser (Bison) then makes sense of the token stream, checking it for correct syntax and usually building a ‘parse tree’ out of the tokens. For simple colourisation, the parser isn’t really necessary, but since Flex and Bison work as a pair, it makes sense to have a Bison parser as well as the Flex tokeniser.

An assembler is essentially a line based language. Each line of assembly instructions stands on its own and isn’t usually related to another line. This makes it quite a bit easier to parse than something like C++ (very, very difficult). The input to Flex is a set of regular expressions which describe how a token is composed. Here’s the definitions for a ‘name’ and a ‘label’ in Microchip assembler:

name [a-z_\.][0-9a-z_\.]*
label {name}\:

If you aren’t familiar with regular expressions, in English this reads: ‘the first character of a name is a lower case letter between a and z. Or an underscore or a dot. Then this is followed by zero or more similar characters but allowing digits as well’. A label is similar, but it is followed by a colon character. And so on. Regular expressions define the input grammar to Flex and it’s essential to understand them before writing a lexer or parser. However, the good news is that the Flex manual is pretty easy to follow and is quite thorough. Also, the regular expressions that Flex understands are pretty simple examples of the beasts – there are no named extensions or ‘lazy’ matching- so it’s really very quick to build up a lexer that recognises tokens that you want to colour.

In the Flex definition file, you then instruct Flex to emit tokens to Bison like this:

{label} { return LABEL; }

This just tells Flex to generate C code to emit a LABEL token when a ‘label’ regular expression is matched. Bison picks this up and decides if it makes sense in a given context. You generally need some sort of parser even if all you want to do is colour a particular token. The reason is that the lexer cannot usually distinguish between tokens which are used in two different contexts. If you rely on just the lexer to categorise your tokens, you may find that you end up with some odd or undesirable colouring in certain places. Visual Studio doesn’t require that you use a parser, but since the output of Bison is what Visual Studio needs, you really have to implement a parser, even if it’s a trivial one.

Coding

Once you’ve defined the Flex core lexer and a simple Bison parser, you can incorporate them into a DLL which can be called from the Visual Studio core. It’s pretty easy to build this DLL, since Visual Studio has a good wizard that builds the skeleton for you.

You can build a Babel compatible DLL quite easily using the Visual Studio Language Package wizard. Since Flex and Bison don’t come with Visual Studio, you’ll need to modify the default Flex/Bison tool location to where you’ve installed them.

If you are just creating a simple colouring service, there are just two or three methods to implement in the service.cpp file (generated for you by the Language Package wizard). This file contains overrides for the methods in stdservice_.cpp, which does the real work of implementing the IBabelService COM interface. Normally, you wouldn’t be bothered with the base classes, but they can be useful in setting breakpoints to find out why something hasn’t coloured correctly.

The three methods you need to override are first the CommentService:

override const CommentFormat* Service::getCommentFormat() const {
static CommentFormat commentFormat = { ";", NULL, NULL, true };
return &commentFormat;
}

This really just allows the IDE to comment selections of code with the correct line comment character.

Next, override the getTokenInfo method. This method connects a token type – such as an assembler REGISTER say with a colour:

override const TokenInfo* Service::getTokenInfo() const {
static TokenInfo tokenInfoTable[] = {
{ REGISTER, ClassRegister, "operand ('%s')", CharKeyword }, …

Finally, you need to define the colours:

override const ColorInfo* Service::getColorInfo() const {
static ColorInfo colorInfoTable[] = {
{ ClassRegister, "Text","color: darkgray" }, …

Additionally, you need to declare an enum which is used to index the arrays:

enum MyColorClass {
       ClassOpCode = ClassDefaultLast + 1,
       ClassLabel,
       ClassDirective,
       ClassSymbol,
       ClassLiteral,
       ClassOperator,
       ClassSpecial,
       ClassRegister
};

Here’s the result – a nicely coloured assembly code fragment hosted in Visual Studio. With my package installed in the Visual Studio IDE, I can now edit assembler and C files, use the Microchip compiler, assembler and linker to create a ‘hex’ file and even program it via a USB link into a Microchip microcontroller:

assembler

With a combination of Flex, Bison and the Babel package, it’s relatively easy to build a language colouriser for Visual Studio.

Troubleshooting

It has to be said that while all the above looks to be simple and straightforward, debugging it can be tricky. There are really two problem areas. The first is that you build your package, implement the Babel interface, load up your file and – it’s not coloured! Totally monochrome, in fact.

The root of this problem probably lies in the Registry. Visual Studio is COM based and all Visual Studio extensions use COM interfaces to communicate with any external code. When Visual Studio tries to determine if you want colouring for your code, it looks up the extension in the Visual Studio part of the Registry, finds out the Guid (CLSID) of the COM component that does the colouring and co-creates it (instantiates it via COM). Additionally, the Babel package that comes with the Visual Studio SDK must be installed – look for the BabelPackage.msi installer in the SDK. The registration of your language colouring package is handled automatically by a post-build step (set up by the Language Package wizard). This calls the VS SDK ‘regit.exe’ program to do the business. Typically, if you get no colourisation at all, one of the above steps has gone astray. The trouble is that, being COM, it can be very difficult and frustrating to find out exactly what is wrong.

The second potential problem is when there is some colourisation, but not quite what you expected. The problem here is usually in the Flex definitions. However, there isn’t an easy way to debug these. Flex and Bison are ‘state-driven’: they use tables of numbers to decide what to do next. With a simple language like an assembler, tracking down these problems is normally quite easy, if a little laborious. It’s a different ball game if you get into a complicated language like C++ or Ruby.

I’ve been of the opinion for a long time, that the speed of development is largely related to how quickly you can debug something. Debugging complex Yacc/Lex state tables isn’t to be undertaken lightly. It’s for this reason that I abandoned the LALR parsers and turned to Antlr - a tool with far better debugging and ease of use. But for simple, fast colouring Bison and Flex are hard to beat.

The registry is the root of (most) COM evil. The trouble is that there isn’t a simple way of finding out what is wrong. You have to do it the hard way – by eyeball.

In the next part of this series, Dermot looks at MSBuild...

April 2006

Home | Archives | Contacts