Polyglot programming – some lessons learned

With Web Workbench now safely out the door, I thought I’d share some lessons learned from its development. (Rest assured you don’t need to know any of this stuff to use Web Workbench.)

One unusual aspect of the development of Web Workbench is the number of languages we used to develop it. Like most .NET projects, it contains a fair chunk of C#. But quite a bit of the core is written in F#, and it also invokes a large amount of external Ruby and JavaScript code. While using all these languages definitely made it far easier to develop the product, it did also throw up a few challenges and surprises.

How Web Workbench fits together

Web Workbench consists of three major chunks: the language parsers, which work out the information needed for syntax highlighting, intellisense and so on; the third-party compilers, which generate the output CSS and JavaScript files; and the Visual Studio integration, which surfaces these capabilities in the Visual Studio UI.

We chose C# to implement the Visual Studio integration, mainly because the tooling for developing Visual Studio components is C#/VB-centric, and there weren’t any compelling reasons to use anything else. So the rest of this review is from the point of view of integrating non-C# components into a C# framework.

F#

F# is a functional language modelled on OCaml. After shipping from Microsoft Research for several years, it is now included in Visual Studio 2010 out of the box. We used F# to implement the parser components of Web Workbench.

We chose F# for a couple of reasons. The first was that constructs like discriminated unions, records and pattern matching would reduce the amount of boring boilerplate code we had to write. In particular, they provide a great way to quickly represent and traverse an abstract syntax tree (AST). The second was the availability of parser libraries. For a previous project we used fslex and fsyacc, external parser generation tools which come with the F# PowerPack. For Web Workbench we switched over to Stephan Tolksdorf’s beautiful parser combinator libary, FParsec. FParsec made it easy to build up higher-level constructs such as “pair” or “with position” and to parameterise parsers for example to reflect the different prefixes for variables in Sass and Less. It’s probably a subject for a separate blog post, but FParsec alone is a compelling reason to use F# for this kind of project!

Interoperating F# with C# is mostly pretty easy. F# assemblies are normal .NET assemblies and expose classes and functions in the standard way. However, there are a few wrinkles and inconveniences.

Discriminated union members

Suppose you define a discriminated union in F# along the lines of:

type AstNode =
| SelectorNode of string * AstNode list
| ErrorNode of string
| OtherNode

In C#, you’ll see an AstNode base class, with derived nested classes named SelectorNode and ErrorNode, and a property named OtherNode. ErrorNode has a single property named Item, and SelectorNode has properties named Item1 and Item2.

public class AstNode
{
  public class SelectorNode
  {
    public string Item1 { get; }
    public FSharpList<AstNode> Item2 { get; }
  }
  public class ErrorNode
  {
    public string Item { get; }
  }
  public AstNode OtherNode { get; }
}

If you find yourself working with these classes from C#, the names are pretty unmemorable and you need to be careful about maintenance as you add or remove members. It’s therefore worth thinking carefully about how to partition work between the two languages. Ideally you want C# to be able to treat the F# classes as opaque objects, so that whenever it needs to do anything with them, it hands them back to F#.

This isn’t always practical, though. In particular, when unit testing the parser, we wanted to be able to make assertions about the AST that results from a given input. We could have written the tests in F#, but the tooling for unit testing is much better in C#. So instead we wrote a bunch of extension methods for the various AST node types that mapped directly onto the Item members, and this seemed to work pretty well.

This was only for tests, and our production code managed to almost entirely avoid peeking into the structure of AST nodes, by dint of writing most node processing code in F#.

F# and C# functions

One issue that this raised was passing C# functions into F#. For example, our AST traversal code, which needed to understand the structure of AST node types, was written in F#, but one of its arguments was a visitor function, and sometimes we needed that visitor function to be written in C# because it was part of the Visual Studio integration framework. Unfortunately, C# passes function arguments as delegates, whereas F# expects function arguments to be of F# function types. (Internally, F# doesn’t represent function types as delegates — it has its own function type named FSharpFunc.)

To get around this, we created a C#-friendly version of the traversal function, which took a delegate, created a F# lambda to call that delegate, and passed that F# lambda to the real traversal function. Another handy tool is the FSharpFunc.FromConverter function, which is a built-in way to convert a delegate to a F# function.

Options and lists

I generally refer to the F# option type as “nullable types done right” — safer and more self-documenting than null references or nullable value types. Using a F# option from C# is pretty easy, but building them is rather ugly. In the few cases where one of our integration points took an option, we implemented a builder functions in F# to avoid having to write out direct invocations to the FSharpOption type.

The experience with F# lists was pretty similar. Throwing a ToList() at a FSharpList quickly got rid of the F#-ness when we were consuming them, but building them is something you just don’t want to do in C#.

Dependencies

One small but annoying surprise we got when we shipped Web Workbench was that, by default, the FSharp.Core runtime doesn’t get copied locally — but that the Visual Studio installer skips the F# redistributables if the user chooses not to install F#. This led to some hard-to-reproduce errors. So set FSharp.Core to copy local, or use the compiler “standalone” flag, even for VSIX projects.

Ruby

The Sass compiler is built in Ruby, and naturally we wanted to use it as it stood rather than rewriting it. In fact, we originally planned to use the Ruby compiler to do our parsing, but we found it was not fast enough to keep up with users typing, and it stopped after the first error, whereas we need to recover in order to provide syntax highlighting on the rest of the document even after an error. (Especially since the document would spend a lot of its time in error while the user was in the middle of typing something!) So we dropped back to implementing our own parser for real-time aspects such as highlighting and intellisense, but still wanted to use the real compiler to ensure full fidelity — and to save ourselves a lot of effort!

Deployment

To avoid requiring our users to install Ruby and Sass, we decided to use IronRuby, Microsoft’s implementation of Ruby on .NET, to run the compiler. This allowed us to xcopy deploy IronRuby as part of the Web Workbench VSIX, obviating the need for a separate install (and coincidentally ensuring we controlled which version of IronRuby we ran under!).

However, IronRuby depends at runtime a bunch of Ruby library source files. Similarly, Sass ships as a whole tree of Ruby source files. We needed to ensure that all these files were available at runtime, and xcopy assembly deployment wasn’t going to handle that for us. We could have included these source files in the VSIX directly, but this would have been very heavy maintenance because of the number of files. Our solution was to create zip files of the IronRuby and Sass trees, and unzip these at runtime. This incurs a small overhead the first time the compiler runs, but in practice this isn’t significant.

One small but crucial detail is that since IronRuby is being hosted as a DLL rather than running from the command line, it needs to be told where to find the unzipped runtime files. This involves a call to ScriptEngine.SetSearchPaths.

Strings

The .NET Dynamic Language Runtime generally tries to map common CLR types such as integers and strings to the corresponding types in the language at hand. For the most part, this works pretty transparently, but sometimes the mismatch can throw up some odd errors. For example, when we set the Sass :load_paths option so that it could resolve @imported files, we started seeing weird “wrong number of arguments (2 for 1)” errors from the Sass compiler.

In IronRuby, strings should be represented as the built-in MutableString type. However, when using APIs such as ScriptScope.SetVariable, it’s possible to pass a .NET string. IronRuby won’t complain about this, and most of the time it will work. We’re not sure exactly what went wrong on the Sass :load_paths case, but we suspect it may have passed an is_a? String test despite not being a true mutable Ruby string. So the lesson is always to wrap strings using MutableString.CreateAscii before passing them to SetVariable.

A similar issue applies to strings returned from Ruby scripts. You can’t cast them to .NET strings, because they are instances of MutableString rather than String. Use ToString() instead.

Ruby versioning and compatibility

IronRuby 1.1.1 implements Ruby 1.9. However, although the implementation is ‘complete enough for Rails,’ it’s not 100% complete. We got a nasty surprise when the Sass compiler started raising missing method exceptions. There’s a place in Sass where it calls the chr function. Under 1.8 it calls this on a Fixnum, but when it thinks it’s running under 1.9, it calls it on a String. And IronRuby 1.1.1 doesn’t implement String.chr.

We were able to solve this problem fairly quickly once we’d identified it, by running some Ruby code during initialisation to re-open the String class and add the missing chr method. The lesson is more to be aware that third-party implementations of Ruby may not be 100% compatible — and again to carefully read exception messages!

(In most cases, Sass does detect IronRuby and falls back to 1.8 behaviour — this just seems to be a case that slipped through the net, but I thought I’d mention it because other Ruby libraries may not be aware of IronRuby’s quirks the way Sass is.)

Exceptions

Talking of exceptions, we struggled for a while to get meaningful error information out of Sass syntax error exceptions. This wasn’t because Sass didn’t provide the information, it was just that IronRuby didn’t make it easy to find. There are a couple of tricks to be aware of here.

First, the Ruby exception data isn’t presented as a nice simple property on the exception object. Instead, you need to call a static method on RubyExceptionData.

RubyExceptionData rubyException = RubyExceptionData.GetInstance(ex);

Once you have a RubyExceptionData object, you can get the Ruby backtrace, which is invaluable for diagnostics!

Second, when Sass sets exception attributes such as the line number in the SCSS source file, these are available as dynamic properties, but it’s not always obvious what dynamic property you need. I found the following snippet useful for listing the members available on an IronRuby exception that had propagated up into C#:

((System.Dynamic.IDynamicMetaObjectProvider)ex).GetMetaObject(System.Linq.Expressions.Expression.Constant(ex)).GetDynamicMemberNames()

(The fully qualified type names are so that you can paste it into the Visual Studio Immediate window while debugging.) Armed with the list of Ruby members it was usually easy to figure out the dynamic call I needed:

object scssLine = ((dynamic)ex).sass_line();

JavaScript

In the same way as we run the real Ruby Sass compiler on our SCSS files, we run the real JavaScript CoffeeScript compiler on our CoffeeScript files. We looked at three options for this: running on Node.js under Cygwin, running on the DLR using IronJS, and running on a ground-up implementation using Jurassic. The Node approach turned out awfully kludgy, involving as it did launching a separate process through an exciting collection of command scripts, so as with Ruby we turned to the DLR. Unfortunately, IronJS wasn’t able to run the CoffeeScript compiler (though the team are looking at the issue and I believe they now have a fix), so Jurassic it was. Jurassic uses a lot of the same terminology as the DLR so it’s a familiar programming experience, but it doesn’t actually use the DLR under the covers so you can’t use DLR tricks like C# dynamic and IDynamicMetaObjectProvider against it.

Still, running the compiler under Jurassic worked nicely. The CoffeeScript compiler is available as a standalone JavaScript file: we could just load that in from a file or string resource and we were good to go. However, as we had initially planned with Sass, we also wanted to use the CoffeeScript parser rather than writing our own, and that turned out to be a bit more challenging, as we could no longer use the standalone compiler — and the ‘non-standalone’ version worked only on Node.

Running Node modules under Jurassic

The solution we adopted was to implement enough of the Node environment to make the CoffeeScript parser happy. Fortunately, there wasn’t very much of this: in fact, it turned out all we needed to do was implement the require function, which the CoffeeScript parser uses to read in the other files it depends on such as the lexer.

This was reasonably easy to do using Jurassic’s ScriptEngine.SetGlobalFunction method. This allowed us to implement file content resolution in C# or F#, where we had access to the System.IO and System.Resources APIs, and assign that resolution function to the name ‘require.’ Then JavaScript calls to require would end up getting handled by the host function, which loaded the content of the ‘required’ file, passed it back to Jurassic to be executed and captured the results to be returned to the requirer.

The technique of mapping Node externals to host functions should allow arbitrary Node modules to be run under Jurassic, though it could require a fair bit of work if the module has a lot of Node-specific dependencies!

Strings

Similar to Ruby, strings returned from JavaScript aren’t always .NET System.Strings. (Sometimes they are, sometimes they aren’t. It depends on how they were constructed inside the JavaScript code.) Again, use ToString() rather than trying to cast to string.

Strange things in close up

JavaScript is famed for its quirks and pitfalls, and Jurassic makes it easy to enjoy them in the usually drearily regular setting of the CLR. For example, would you guess that these do the same thing?

engine.Execute("a = { }");
engine.SetGlobalValue("a", engine.Evaluate("{ }"));

Of course they don’t. The first sets a to an empty object, as you would expect. The second sets a to undefined. So when injecting code into Jurassic from the host, watch out for JavaScript evaluation quirks!

Conclusion

The polyglot approach is tremendously powerful, but inevitably the integration isn’t completely seamless. That’s probably unrealistic: different languages have different conceptual models, and different semantics even for supposedly common types such as strings.

I’m conscious that this article has focused on the difficulties and surprises, and I want to re-emphasise that the polyglot approach was definitely a net win for us. We simply could not have delivered Web Workbench as a monolingual program. The parsers that underlie the syntax highlighting and intellisense would have been much harder to develop, even with the help of a C#-friendly toolkit like ANTLR, and we couldn’t even have attempted the compilers. I guess we could have shelled out to Ruby and Node but the deployment issues would have been horrible, and of course that’s really just concealing the polyglot nature rather than getting rid of it.

The boundaries between the C# and the Ruby/JavaScript code were pretty well defined and well bounded. Integrating our own F# code into C# was a more interesting design exercise: as I mentioned, some idioms don’t translate well, so identifying the right boundary between the two languages is very important to keeping each side idiomatic and avoiding warts like delegates in F# and ItemX calls in C#. We end up shuttling things back and forth across the language boundary quite a lot in order to do the right processing in the right place, but the experience is generally pretty seamless.

Based on our experience with Web Workbench, polyglot programming isn’t something you should do lightly. If you’ve only got one or two files that would be better off as F# or Ruby or JavaScript, then it’s probably more efficient to port them to your language of choice than to deal with the integration and deployment issues. But it’s nothing to be afraid of, either. In this post I’ve tried to alert you to some of the things you’ll encounter and how we solved them, but I’ve also tried to emphasise that the difficulties were outweighed by the benefits. We hope this will encourage you to keep your eyes open for places where polyglot programming can help you too!

If you enjoyed this article, please consider voting for it on Hacker News — thanks!

Tagged as F#

10 Responses to “Polyglot programming – some lessons learned”

  • […] Polyglot programming – some lessons learned – Ivan Towlson of Mindscape discusses the lessons they learned in creating Web Workbench using ‘Polyglot Programming’ across C#, F#, Ruby and JavaScript, discussing some of the difficulties in integration, and how Ployglot programming is not something to undertake lightly. […]

  • I started out with Jurassic running the CoffeeScript compiler in this project…
    http://coffeemonitor.codeplex.com/

    But it was taking 11 seconds to compile six *.coffee files. Switching to an embedded version of the V8 engine (that I borrowed from the SassAndCoffee project) got that down to 0.4 seconds or so.

  • […] Akit pedig az érdekel, mindezt hogyan sikerült a fejlesztÅ‘knek megvalósítani, annak ajánlom figyelmébe ezt a blogbejegyzést. […]

  • This is awesome. Let’s do a podcast on polyglot programming, OK? Give me a holler.

  • Good to see some fellow Wellingtonians using my pet project, Jurassic! You probably already know this, but for those who don’t: engine.Evaluate(“{ }”) returns undefined because ‘{‘ starts a new scope if it is the first token on a line. engine.Evaluate(“({ })”) does return a empty object.

  • Ah, thanks for the tip, Paul (and thanks for building Jurassic)! I guessed it must be something like that, but I’d never have guessed the answer. Guess I need to brush up on my JavaScript…

  • Don’t worry, that particular feature sits in an old and dusty corner, with only compiler writers looking after it :-)

    Here’s how you use it as a poor man’s goto:
    label: { console.log(“1″); break label; console.log(“2″); } –> prints “1”
    The break jumps to the end of the block, not the start. It doesn’t create a new variable scope, or anything like that. Pretty useless really :-/

    Another feature that I found surprising:
    (function() { var x = 5; return delete x; })() –> returns false, x is unchanged.
    (function() { eval(“var x = 5″); return delete x; })() –> returns true, x is deleted.

  • […] also created a nice making of blog post for Web Workbench for which they used: F#, IronRuby, Jurassic and bits of Node.JS. This sounds like […]

  • […] Polyglot programming – combining functional, dynamic and imperative languages (mindscapehq.com) […]

  • Awesome article Ivan! Great example of practical polyglot programming on the .NET platform – and parsers are such a perfect use case for F#!

  • Leave a Reply

Archives

Join our mailer

You should join our newsletter! Sent monthly:

Back to Top