Polyglot programming – some lessons learned

With Web Workbench now safely out the door, I thought I’d share some lessons learned from its development. (Rest assured you don’t need to know any of this stuff to use Web Workbench.)

One unusual aspect of the development of Web Workbench is the number of languages we used to develop it. Like most .NET projects, it contains a fair chunk of C#. But quite a bit of the core is written in F#, and it also invokes a large amount of external Ruby and JavaScript code. While using all these languages definitely made it far easier to develop the product, it did also throw up a few challenges and surprises.

How Web Workbench fits together

Web Workbench consists of three major chunks: the language parsers, which work out the information needed for syntax highlighting, intellisense and so on; the third-party compilers, which generate the output CSS and JavaScript files; and the Visual Studio integration, which surfaces these capabilities in the Visual Studio UI.

We chose C# to implement the Visual Studio integration, mainly because the tooling for developing Visual Studio components is C#/VB-centric, and there weren’t any compelling reasons to use anything else. So the rest of this review is from the point of view of integrating non-C# components into a C# framework.


F# is a functional language modelled on OCaml. After shipping from Microsoft Research for several years, it is now included in Visual Studio 2010 out of the box. We used F# to implement the parser components of Web Workbench.

We chose F# for a couple of reasons. The first was that constructs like discriminated unions, records and pattern matching would reduce the amount of boring boilerplate code we had to write. In particular, they provide a great way to quickly represent and traverse an abstract syntax tree (AST). The second was the availability of parser libraries. For a previous project we used fslex and fsyacc, external parser generation tools which come with the F# PowerPack. For Web Workbench we switched over to Stephan Tolksdorf’s beautiful parser combinator libary, FParsec. FParsec made it easy to build up higher-level constructs such as “pair” or “with position” and to parameterise parsers for example to reflect the different prefixes for variables in Sass and Less. It’s probably a subject for a separate blog post, but FParsec alone is a compelling reason to use F# for this kind of project!

Interoperating F# with C# is mostly pretty easy. F# assemblies are normal .NET assemblies and expose classes and functions in the standard way. However, there are a few wrinkles and inconveniences.

Discriminated union members

Suppose you define a discriminated union in F# along the lines of:

type AstNode =
| SelectorNode of string * AstNode list
| ErrorNode of string
| OtherNode

In C#, you’ll see an AstNode base class, with derived nested classes named SelectorNode and ErrorNode, and a property named OtherNode. ErrorNode has a single property named Item, and SelectorNode has properties named Item1 and Item2.

public class AstNode
  public class SelectorNode
    public string Item1 { get; }
    public FSharpList<AstNode> Item2 { get; }
  public class ErrorNode
    public string Item { get; }
  public AstNode OtherNode { get; }

If you find yourself working with these classes from C#, the names are pretty unmemorable and you need to be careful about maintenance as you add or remove members. It’s therefore worth thinking carefully about how to partition work between the two languages. Ideally you want C# to be able to treat the F# classes as opaque objects, so that whenever it needs to do anything with them, it hands them back to F#.

This isn’t always practical, though. In particular, when unit testing the parser, we wanted to be able to make assertions about the AST that results from a given input. We could have written the tests in F#, but the tooling for unit testing is much better in C#. So instead we wrote a bunch of extension methods for the various AST node types that mapped directly onto the Item members, and this seemed to work pretty well.

This was only for tests, and our production code managed to almost entirely avoid peeking into the structure of AST nodes, by dint of writing most node processing code in F#.

F# and C# functions

One issue that this raised was passing C# functions into F#. For example, our AST traversal code, which needed to understand the structure of AST node types, was written in F#, but one of its arguments was a visitor function, and sometimes we needed that visitor function to be written in C# because it was part of the Visual Studio integration framework. Unfortunately, C# passes function arguments as delegates, whereas F# expects function arguments to be of F# function types. (Internally, F# doesn’t represent function types as delegates — it has its own function type named FSharpFunc.)

To get around this, we created a C#-friendly version of the traversal function, which took a delegate, created a F# lambda to call that delegate, and passed that F# lambda to the real traversal function. Another handy tool is the FSharpFunc.FromConverter function, which is a built-in way to convert a delegate to a F# function.

Options and lists

I generally refer to the F# option type as “nullable types done right” — safer and more self-documenting than null references or nullable value types. Using a F# option from C# is pretty easy, but building them is rather ugly. In the few cases where one of our integration points took an option, we implemented a builder functions in F# to avoid having to write out direct invocations to the FSharpOption type.

The experience with F# lists was pretty similar. Throwing a ToList() at a FSharpList quickly got rid of the F#-ness when we were consuming them, but building them is something you just don’t want to do in C#.


One small but annoying surprise we got when we shipped Web Workbench was that, by default, the FSharp.Core runtime doesn’t get copied locally — but that the Visual Studio installer skips the F# redistributables if the user chooses not to install F#. This led to some hard-to-reproduce errors. So set FSharp.Core to copy local, or use the compiler “standalone” flag, even for VSIX projects.


The Sass compiler is built in Ruby, and naturally we wanted to use it as it stood rather than rewriting it. In fact, we originally planned to use the Ruby compiler to do our parsing, but we found it was not fast enough to keep up with users typing, and it stopped after the first error, whereas we need to recover in order to provide syntax highlighting on the rest of the document even after an error. (Especially since the document would spend a lot of its time in error while the user was in the middle of typing something!) So we dropped back to implementing our own parser for real-time aspects such as highlighting and intellisense, but still wanted to use the real compiler to ensure full fidelity — and to save ourselves a lot of effort!


To avoid requiring our users to install Ruby and Sass, we decided to use IronRuby, Microsoft’s implementation of Ruby on .NET, to run the compiler. This allowed us to xcopy deploy IronRuby as part of the Web Workbench VSIX, obviating the need for a separate install (and coincidentally ensuring we controlled which version of IronRuby we ran under!).

However, IronRuby depends at runtime a bunch of Ruby library source files. Similarly, Sass ships as a whole tree of Ruby source files. We needed to ensure that all these files were available at runtime, and xcopy assembly deployment wasn’t going to handle that for us. We could have included these source files in the VSIX directly, but this would have been very heavy maintenance because of the number of files. Our solution was to create zip files of the IronRuby and Sass trees, and unzip these at runtime. This incurs a small overhead the first time the compiler runs, but in practice this isn’t significant.

One small but crucial detail is that since IronRuby is being hosted as a DLL rather than running from the command line, it needs to be told where to find the unzipped runtime files. This involves a call to ScriptEngine.SetSearchPaths.


The .NET Dynamic Language Runtime generally tries to map common CLR types such as integers and strings to the corresponding types in the language at hand. For the most part, this works pretty transparently, but sometimes the mismatch can throw up some odd errors. For example, when we set the Sass :load_paths option so that it could resolve @imported files, we started seeing weird “wrong number of arguments (2 for 1)” errors from the Sass compiler.

In IronRuby, strings should be represented as the built-in MutableString type. However, when using APIs such as ScriptScope.SetVariable, it’s possible to pass a .NET string. IronRuby won’t complain about this, and most of the time it will work. We’re not sure exactly what went wrong on the Sass :load_paths case, but we suspect it may have passed an is_a? String test despite not being a true mutable Ruby string. So the lesson is always to wrap strings using MutableString.CreateAscii before passing them to SetVariable.

A similar issue applies to strings returned from Ruby scripts. You can’t cast them to .NET strings, because they are instances of MutableString rather than String. Use ToString() instead.

Ruby versioning and compatibility

IronRuby 1.1.1 implements Ruby 1.9. However, although the implementation is ‘complete enough for Rails,’ it’s not 100% complete. We got a nasty surprise when the Sass compiler started raising missing method exceptions. There’s a place in Sass where it calls the chr function. Under 1.8 it calls this on a Fixnum, but when it thinks it’s running under 1.9, it calls it on a String. And IronRuby 1.1.1 doesn’t implement String.chr.

We were able to solve this problem fairly quickly once we’d identified it, by running some Ruby code during initialisation to re-open the String class and add the missing chr method. The lesson is more to be aware that third-party implementations of Ruby may not be 100% compatible — and again to carefully read exception messages!

(In most cases, Sass does detect IronRuby and falls back to 1.8 behaviour — this just seems to be a case that slipped through the net, but I thought I’d mention it because other Ruby libraries may not be aware of IronRuby’s quirks the way Sass is.)


Talking of exceptions, we struggled for a while to get meaningful error information out of Sass syntax error exceptions. This wasn’t because Sass didn’t provide the information, it was just that IronRuby didn’t make it easy to find. There are a couple of tricks to be aware of here.

First, the Ruby exception data isn’t presented as a nice simple property on the exception object. Instead, you need to call a static method on RubyExceptionData.

RubyExceptionData rubyException = RubyExceptionData.GetInstance(ex);

Once you have a RubyExceptionData object, you can get the Ruby backtrace, which is invaluable for diagnostics!

Second, when Sass sets exception attributes such as the line number in the SCSS source file, these are available as dynamic properties, but it’s not always obvious what dynamic property you need. I found the following snippet useful for listing the members available on an IronRuby exception that had propagated up into C#:


(The fully qualified type names are so that you can paste it into the Visual Studio Immediate window while debugging.) Armed with the list of Ruby members it was usually easy to figure out the dynamic call I needed:

object scssLine = ((dynamic)ex).sass_line();


In the same way as we run the real Ruby Sass compiler on our SCSS files, we run the real JavaScript CoffeeScript compiler on our CoffeeScript files. We looked at three options for this: running on Node.js under Cygwin, running on the DLR using IronJS, and running on a ground-up implementation using Jurassic. The Node approach turned out awfully kludgy, involving as it did launching a separate process through an exciting collection of command scripts, so as with Ruby we turned to the DLR. Unfortunately, IronJS wasn’t able to run the CoffeeScript compiler (though the team are looking at the issue and I believe they now have a fix), so Jurassic it was. Jurassic uses a lot of the same terminology as the DLR so it’s a familiar programming experience, but it doesn’t actually use the DLR under the covers so you can’t use DLR tricks like C# dynamic and IDynamicMetaObjectProvider against it.

Still, running the compiler under Jurassic worked nicely. The CoffeeScript compiler is available as a standalone JavaScript file: we could just load that in from a file or string resource and we were good to go. However, as we had initially planned with Sass, we also wanted to use the CoffeeScript parser rather than writing our own, and that turned out to be a bit more challenging, as we could no longer use the standalone compiler — and the ‘non-standalone’ version worked only on Node.

Running Node modules under Jurassic

The solution we adopted was to implement enough of the Node environment to make the CoffeeScript parser happy. Fortunately, there wasn’t very much of this: in fact, it turned out all we needed to do was implement the require function, which the CoffeeScript parser uses to read in the other files it depends on such as the lexer.

This was reasonably easy to do using Jurassic’s ScriptEngine.SetGlobalFunction method. This allowed us to implement file content resolution in C# or F#, where we had access to the System.IO and System.Resources APIs, and assign that resolution function to the name ‘require.’ Then JavaScript calls to require would end up getting handled by the host function, which loaded the content of the ‘required’ file, passed it back to Jurassic to be executed and captured the results to be returned to the requirer.

The technique of mapping Node externals to host functions should allow arbitrary Node modules to be run under Jurassic, though it could require a fair bit of work if the module has a lot of Node-specific dependencies!


Similar to Ruby, strings returned from JavaScript aren’t always .NET System.Strings. (Sometimes they are, sometimes they aren’t. It depends on how they were constructed inside the JavaScript code.) Again, use ToString() rather than trying to cast to string.

Strange things in close up

JavaScript is famed for its quirks and pitfalls, and Jurassic makes it easy to enjoy them in the usually drearily regular setting of the CLR. For example, would you guess that these do the same thing?

engine.Execute("a = { }");
engine.SetGlobalValue("a", engine.Evaluate("{ }"));

Of course they don’t. The first sets a to an empty object, as you would expect. The second sets a to undefined. So when injecting code into Jurassic from the host, watch out for JavaScript evaluation quirks!


The polyglot approach is tremendously powerful, but inevitably the integration isn’t completely seamless. That’s probably unrealistic: different languages have different conceptual models, and different semantics even for supposedly common types such as strings.

I’m conscious that this article has focused on the difficulties and surprises, and I want to re-emphasise that the polyglot approach was definitely a net win for us. We simply could not have delivered Web Workbench as a monolingual program. The parsers that underlie the syntax highlighting and intellisense would have been much harder to develop, even with the help of a C#-friendly toolkit like ANTLR, and we couldn’t even have attempted the compilers. I guess we could have shelled out to Ruby and Node but the deployment issues would have been horrible, and of course that’s really just concealing the polyglot nature rather than getting rid of it.

The boundaries between the C# and the Ruby/JavaScript code were pretty well defined and well bounded. Integrating our own F# code into C# was a more interesting design exercise: as I mentioned, some idioms don’t translate well, so identifying the right boundary between the two languages is very important to keeping each side idiomatic and avoiding warts like delegates in F# and ItemX calls in C#. We end up shuttling things back and forth across the language boundary quite a lot in order to do the right processing in the right place, but the experience is generally pretty seamless.

Based on our experience with Web Workbench, polyglot programming isn’t something you should do lightly. If you’ve only got one or two files that would be better off as F# or Ruby or JavaScript, then it’s probably more efficient to port them to your language of choice than to deal with the integration and deployment issues. But it’s nothing to be afraid of, either. In this post I’ve tried to alert you to some of the things you’ll encounter and how we solved them, but I’ve also tried to emphasise that the difficulties were outweighed by the benefits. We hope this will encourage you to keep your eyes open for places where polyglot programming can help you too!

If you enjoyed this article, please consider voting for it on Hacker News — thanks!

Tagged as F#

Object expressions in F#

Today I’ve been building a simple auditing feature for LightSpeed which automatically logs who created, deleted or last modified an entity. One of the wrinkles in this feature is of course that the way to get the username differs from application to application: in a Windows client application, it might be the logged-on user, but in a Web application with Forms authentication we’d want it to be the HttpContext user rather than the operating system user account. Obviously, this is a great fit for the strategy pattern, and that’s exactly what I used: I defined an interface for getting the current user, and built Windows client and ASP.NET implementations of that interface.

internal class WindowsIdentityAuditInfoStrategy : IAuditInfoStrategy
  public static readonly IAuditInfoStrategy Instance = new WindowsIdentityAuditInfoStrategy();
  private WindowsIdentityAuditInfoStrategy() { }
  public AuditInfoMode Mode { get { return AuditInfoMode.WindowsIdentity; } }
  public string GetCurrentUser() { /* super-secret Mindscape proprietary code */ }

The annoying thing about this is that I don’t really need all the overhead of a class, and could also do without the trickery required to make it a singleton. What I’d really like to do is declare an object that implements the interface. That would save me writing all the class verbiage, and also make it clearer to readers of my code that the object was a singleton.

If I’d been using F#, I could have done exactly this. F# has a feature called object expressions, which basically allow you to create an object with members and behaviour, without creating a class to contain those members and behaviour. Let’s look at an example.

let windowsIdentityAuditInfoStrategy = {
  new IAuditInfoStrategy with
    member this.Mode = AuditInfoMode.WindowsIdentity
    member this.GetCurrentUser() = WindowsIdentity.GetCurrent().Name }  // oh no!  I gave it away!
printfn "%A" (windowsIdentityAuditInfoStrategy.GetCurrentUser())
// prints "athena\Ivan"

What’s going on here? That opening brace followed by new IAuditInfoStrategy tells F# that this is an object expression: that I want an object that implements IAuditInfoStrategy as shown. F# obligingly creates a suitable object by some nefarious means which I don’t have to worry about, and returns it into my variable.

Object expressions aren’t limited to interfaces. You can use them whenever you want to inject a bit of implementation without creating a whole new type:

let purportedDromedary = {
  new Object() with
    member this.ToString() = "I'm a dromedary" }
printfn "%A" purportedDromedary
// prints I'm a dromedary

Nor are they limited to singletons. In fact a handy feature of F# object expressions is that they can capture local variables, just like closures do:

let purported thingy = { new Object() with member this.ToString() = "I'm a " + thingy }
printfn "%A" (purported "wildebeeste")
// prints I'm a wildebeeste
printfn "%A" (purported "trask")
// prints I'm a trask

Obviously impersonating equatorial wildlife is fairly specialised as business requirements go, so let’s go back to the auditing example. In LightSpeed, the IAuditInfoStrategy is public, so that customers can implement their own application-specific user identification strategies. Perhaps the application has its own custom security system. Perhaps it is integrated into Lotus Notes– but no, some things are too horrible to contemplate. Regardless, we might find it useful to define a common base class for custom strategies:

type CustomAuditInfoStrategy() =
  interface IAuditInfoStrategy with
    member this.Mode = AuditInfoMode.Custom
    member this.GetCurrentUser() = this.GetCurrentUserCore()
  abstract member GetCurrentUserCore : unit -> string

So far, this is just a normal F# abstract class, to save us having to reimplement Mode every time. Let’s put it to the all-important work of making someone else carry the can:

let blame x = {
  new CustomAuditInfoStrategy() with
    member this.GetCurrentUserCore() = x }

What does the blame function do? It creates a new CustomAuditInfoStrategy whose GetCurrentUserCore method always returns our designated scapegoat. Notice that we haven’t had to create a ScapegoatingStrategy class to contain this brutal but realistic business logic: we can do it inline in the blame function. This keeps the blaming logic within the blame function instead of spinning it out to a separate location. And of course we can call the function multiple times to shift the blame around as required. Let’s try it out:

let context = new LightSpeedContext()
context.AuditInfoMode <- AuditInfoMode.Custom
context.CustomAuditInfoStrategy <- blame "bob"
printfn "%A" (context.CustomAuditInfoStrategy.GetCurrentUser())
// prints "bob"
context.CustomAuditInfoStrategy <- blame "kate"
printfn "%A" (context.CustomAuditInfoStrategy.GetCurrentUser())
// prints "kate"

Of course, under the surface, the F# compiler is defining derived classes of CustomAuditInfoStrategy just as you would if you were implementing this in C#. (Load the compiled F# code into a C# decompiler such as Reflector if you want to see what’s really going on.) But at the source code level, you don’t need to worry about this: you can just spin up objects to implement interfaces or abstract classes, or just to tweak concrete classes, and save yourself the overhead of writing a special class definition out longhand.

Tagged as F#

Recursive iterators in F#

C#’s iterator feature makes it very easy to produce sequences without having to build your own data structure — particularly handy for lazy sequences. However one common niggle is that if the iterator obtains a sequence of things it wants to include in its returned sequence, it has to explicitly traverse that sequence yielding them one at a time. A place this often turns up is when the iterator calls itself recursively, for example when traversing a hierarchy. For example, consider an iterator that gets all the files in or below a specified directory:

public static IEnumerable<string> FilesBelow(string directory)
  foreach (var file in Directory.EnumerateFiles(directory))
    yield return file;
  foreach (var subdir in Directory.EnumerateDirectories(directory))
    // yield return FilesBelow(subdir);  // won't compile
    foreach (var file in FilesBelow(subdir))
      yield return file;

Note the extra foreach loop over the results of the recursive call.

F# has something similar to iterators, which it calls sequence expressions. A sequence expression is introduced by the term seq, and similar to C# it uses the term yield to produce values:

seq { for n in 1 .. 10 do yield n * n }  // yields 1, 4, 9 ... 81, 100

(You can use -> as a shorthand for do yield here, but I’ve spelled it out explicitly because of what’s coming next.)

Like a C# iterator, a F# sequence expression can call functions which return sequences and yield up the results. We can therefore rewrite our C# directory flattening function like this:

let rec filesBelow directory =
  seq {
    for f in Directory.EnumerateFiles(directory) do yield f
    for d in Directory.EnumerateDirectories(directory) do
      for f in (filesBelow d) do
        yield f

Unlike C#, however, F# sequence expressions do have a special keyword for yielding an entire sequence. That keyword is yield!, pronounced yield-bang. yield! allows us to tighten up our recursive iterator nicely:

let rec filesBelow directory =
  seq {
    for f in Directory.EnumerateFiles(directory) do yield f
    for d in Directory.EnumerateDirectories(directory) do yield! (filesBelow d)

Using yield! on the results of the recursive call allows us to avoid the noise of the extra ‘for’ expression.

Tagged as F#

Functions versus member methods in F#

F# is a hybrid object-functional language, and allows you to write code in member methods (like C# methods) or in global functions. The F# library contains several cases where a member and a global function do the same thing. For example:

> let l = [ 1; 2; 3 ];;
val l : int list = [1; 2; 3]
> l.Length;;
val it : int = 3
> List.length l;;
val it : int = 3

Several other List module functions, such as head and tail, are also duplicated by properties. It’s not just lists, either: for example, the .NET String.Length property is duplicated by the F# library String.length module function.

So when would you choose a member such as .Length member over a function such as List.length or String.length, or vice versa?

The answer, perhaps surprisingly, is that you should usually choose the function. The reason is to do with F# type inference.

In F# code, you don’t usually need to specify the types of variables and parameters, because the compiler will work them out by analysing the code. Take a look at the following code fragment:

let twiceLength a = 2 * List.length a

F# infers that twiceLength takes a list and returns an int. We’ve not had to specify this: F# has worked it out. How?

Well, when F# tries to work out the type of a, it looks at how a is used in the body of the function. It sees that a is passed to the List.length function. Now it looks at the List.length function and sees that its argument is of list type. So, F# reasons, a must be of list type. Job done!

But what if we wrote the function using a member?

let twiceLength a = 2 * a.Length

Oh no! We get a compiler error, “Lookup on object of indeterminate type based on information prior to this program point.” What does this mean? It means that F# can only figure out that a must be something with a .Length member. And that’s not enough to pin down the type. List has a .Length member, String has a .Length member, ExperimentalGermanFilm has a .Length member… F# can’t tell which of these is intended, so automatic type inference fails, and we have to go back to writing it out longhand:

let twiceLength (a : String) = 2 * a.Length  // Now F# can resolve the .Length call

In terms of the amount of code, there isn’t much to choose between the two. And of course, if you’re working with a type that only comes with members, not helper functions, then you’ll have to go the type annotation route — not that there’s anything wrong with that. But type inference is a bit more idiomatic where you have a choice.

And that, my liege, is how we know the earth to be banana shaped.

Tagged as F#

First-class functions in F#, part 0

I wrote a while back about how F# makes it easy to work with functions, but this recent question on Stack Overflow reminded me that I hadn’t really talked about the fundamental difference between F#’s first-class functions and C#/VB’s delegates. In C# and Visual Basic, there are delegate types for all kinds of function signatures, but there’s no canonical function type. Whereas in F#, function types are part of the language.

Consider the following line of C#:

var isReticulated = delegate(Spline s) { return s.IsReticulated; };

As the Stack Overflow poster discovered, this doesn’t compile. Why not? Because although the right hand side is clearly a function from Spline to bool, C# doesn’t recognise that as a type. C# only recognises specific delegate types such as Func<Spline, bool>, Predicate<Spline> or Converter<Spline, bool>. So you have to pick one of these and spell it out:

Predicate<Spline> isReticulated = delegate(Spline s) { return s.IsReticulated; };

And woe betide you if you have to deal with APIs that have made different choices, because there’s no conversion between ‘compatible’ delegate types:

public Predicate<Person> GetFilter(FilterSpecification spec) { ... }
public IEnumerable<Person> GetFilteredPeople(FilterSpecification spec)
  var filter = GetFilter(spec);
  return _people.Where(filter);  // compiler error

The Where method demands a Func<T, bool>, so if all you have is a Predicate<T> then you’re stuffed, even though they’re both functions from T to bool! Instead you have to write a guffy little adapter:

return _people.Where(p => filter(p));  // everybody loves code noise

Contrast this with F#, in which functions do have a type:

let isReticulated = fun (s : Spline) -> s.IsReticulated

The compiler figures out the type as Spline -> bool. Which means you can pass this function to anything that expects a function from Spline to bool. You don’t need to worry about the difference between Funcs, Predicates, Converters and goodness knows what else: instead, function types are standardised at the language level.

let reticulatedSplines = List.filter isReticulated splines  // List.filter takes a 'T -> bool
let antifilter predicate list = List.filter (predicate >> not) list  // Compiler infers type of predicate as 'T -> bool
let unreticulatedSplines = antifilter isReticulated splines  // so isReticulated is compatible with antifilter

Any function with a function argument can work with any function with the right signature — not just ones that happen to have been packaged up into the right delegate type.

And that, at bottom, is why functions are first class in F#, and not in C# or Visual Basic.

Tagged as F#


Join our mailer

You should join our newsletter! Sent monthly:

Back to Top