Reflection, performance and runtime code generation

If you’re developing a library or a utility module, you’ll often you need to make it work with different types — including types that you don’t know about at build time. In many cases, you can handle this with .NET generics, but sometimes you need to work with the specific features of types, without knowing what those types are. For example, in LightSpeed, we need to be able to set the fields of an entity from a database, without knowing what those entity fields are. Or if you’re writing a generic Clone method, you need to be able to copy fields from one object to another. You can’t do this with generics because you need to access the specific fields, whereas generics only allow you to access stuff in interfaces or other constraints declared at design time.

The traditional solution to this is to use Reflection. Reflection gives you a way to enumerate and invoke methods, properties, constructors and fields. It’s pretty easy to write the ‘set all fields’ or ‘copy all fields’ code this way. However, the big problem is that it’s slow. Horribly, horribly slow.

If your code runs only occasionally, this may not be a big deal, but if you’re working with a lot of objects, Reflection code quickly becomes a bottleneck. At this point, it’s worth looking at an alternative: runtime code generation.

The idea of runtime code generation goes something like this.

  • If we could write code that was specific to each type we had to handle, then that would run really fast.
  • But we can’t do that, because we don’t have access to the types that the user of our library is going to come up with.
  • But if we did have access to those types, the code we’d write for each type would follow a predictable pattern (maybe simple, maybe complicated — but there must *be* a pattern or we couldn’t be writing a generic library in the first place!).
  • Now if the type-specific code follows a predictable pattern, we can write a program where we give it a type, and it generates and compiles the type-specific code.
  • And in that case, we can run that program at run-time, and it will create fast, type-specific methods for each of our user’s types!

At first this may sound crazy. Running a code generator and compiler sounds like it will be even slower than using Reflection. And won’t the code generator have to use Reflection to get the details of the user type anyway? Yes and yes. But the code generator and compiler have to run only once per type. They produce a compiled method, and we can call that method again and again, without needing to regenerate or recompile it.

Okay, so maybe it’s not crazy, but isn’t it fearfully difficult? Well, it can be, but it certainly doesn’t have to be. We’ll look at three ways, starting out at the highest level and gradually dropping down to the lowest.

Generating code using a high-level language

The easiest way to generate code is as C# or Visual Basic! For example, you could build up C# code as a string, either by concatenating strings or by hosting a templating engine such as T4. You can then compile the generated code using CodeDom.

This is pretty easy, though it requires careful attention to detail. However, CodeDom is quite heavyweight and slow, and you have to basically compile an entire assembly, which means you incur even more overhead creating classes and locating your generated method in the assembly. Since we’re doing this for performance reasons, this often makes CodeDom a somewhat unattractive choice, but for some scenarios it can be excellent.

Generating code with expression trees

Expression trees were introduced as part of LINQ in .NET 3.5. An expression tree is like an abstract representation of a piece of code, not bound to any particular language. Usually the compiler creates expression trees for you as part of a LINQ expression, but you can also create them yourself using the expressions API, and (tada!) you can compile them to delegates.

The trick to runtime code generation using expression trees is to figure out what your desired code would look like as a lambda. For example, suppose you want to generate a method to instantiate objects of an unknown type — basically the same as Activator.CreateInstance, but fast. (I’m going to ignore the possibility of using generics and the new() constraint, which would solve this problem in many, but not all, cases, because I want to show you expression trees, and this is a nice simple example.) Here’s a lambda that would do that:

Func<T> creator = () => new T();
// usage: T instance = creator();

How can we create a corresponding expression tree for a type? Fortunately, the expressions API corresponds pretty closely to the constructs we used in our lambda:

var lambdaBody = Expression.New(type);  // this corresponds to "new T()"
var lambdaExpr = Expression.Lambda<Func<object>>(lambdaBody);  // this corresponds to "() => ..."
Func<object> creator = lambdaExpr.Compile();

The flow here is a bit tricky to read because it’s inside-out rather than left-to-right. We first create an expression representing the body of the lambda: “new T()”. Our body is so simple we can do this using a single Expression.New call. Then we wrap it in an Expression.Lambda, which roughly corresponds to sticking the “() =>” on the front. .NET requires the Expression.Lambda instead of allowing us to compile the body directly because in the more general case the lambda might have parameters.

If you’re wondering whether this is worthwhile, calling the “creator” delegate is now about five times faster than calling Activator.CreateInstance (at least on my machine). If the delegate were replacing more Reflection calls — as in the object cloning example — then the savings would probably be even greater.

Expression trees are reasonably easy to write, reasonably safe, and quite efficient. For many runtime code generation applications, these could well be the sweet spot.

Generating code using dynamic methods

Expression trees are spiffy, but have some limitations. One of these is that in .NET 3.5 you are limited to simple expressions. (This limitation is greatly relaxed in .NET 4, which provides new expression types to represent things like loops, if statements and goto. Yes, all this whizzy modern technology, and it still has goto.) Another is that you can’t use an expression tree to generate a member method, which is important if your generated code needs access to private members of the user type, as in the cloning example which needs access to private fields. For that, you need to drop down to the lowest level: generating IL directly using dynamic methods.

(By the way, dynamic methods aren’t the only place you can use the direct IL generation technique. If you need to generate an entire class or assembly, then you can’t use expression trees for that, so you have to either move up to CodeDom, or down AssemblyBuilder, TypeBuilder and related Reflection.Emit classes, in which case you’ll be generating IL into the method bodies. I’m not going to cover that here; maybe some other time.)

Setting up a dynamic method is pretty easy. Here’s one for the ‘instantiate a new object’ example we had for expression trees:

// using System.Reflection.Emit;
 
var dynamicMethod = new DynamicMethod(
  "create_" + type.GetName(),     // name of the dynamic method
  typeof(object),                 // the return type
  new Type[0],                    // the dynamic method has no parameters
  type);                          // which type it's a member of
// TODO: create method body
Func<object> creator = dynamicMethod.CreateDelegate(typeof(Func<object>));

You just need to provide a name for the method, specify the signature (parameter and return types), and say which type the method is a member of, though this is usually only an issue if the method needs to access private members such as fields.

The tricky bit is generating the IL for the dynamic method. No friendly expression trees or C# compiler to help you here: just man versus IL opcode, two enter, only one leaves. This means you have two alternatives: (1) learn IL or (2) cheat.

I’m not going to talk about option 1.

To cheat, you write an example instance of your dynamic method in a high level language such as C#, baking in an example ‘user type’ you have created for the purpose. You then compile this in the usual way to produce an EXE or DLL. (Make sure you compile in Release mode — if you’re in Debug mode then the compiler will emit all sorts of extra nonsense which is no use to you.)

Now open your compiled EXE or DLL in ILDASM. Yes, that’s right, ILDASM, the application you last looked at in 2002 and which you thought had been completely superseded by Reflector. ILDASM will show you the IL op codes that your example method compiled to, and all you need to do is copy those op codes into your DynamicMethod, replacing specific references to fields, types, etc. of your example type with dynamically-generated references to the fields, types, etc. of the type for which you are creating the DynamicMethod.

For example, here’s what ILDASM shows for new-ing up an object:

We can map this across pretty mechanically to IL op code objects in the System.Reflection.Emit.OpCodes class, and thereby generate the equivalent IL into our dynamic method.

var constructor = type.GetConstructor(new Type[] { });  // we want to call the default constructor
 
var ilGenerator = dynamicMethod.GetILGenerator();
ilGenerator.Emit(OpCodes.Newobj, constructor);
ilGenerator.Emit(OpCodes.Ret);

Obviously, in most cases, the code generation would be more complex: for example, in the object cloning scenario, you’d loop over the fields you wanted to copy, emitting a Ldfld (load field) and Stfld (store to field) op code for each one until you’d built up the entire dynamic method.

IL generation is low-level, but that makes it very efficient. There’s no extra compilation step as there was with expression trees. If you have to generate a lot of dynamic methods, then IL and DynamicMethod could be the way to go.

Conclusion

Most applications don’t need runtime code generation — either they know everything they need to know at compile time, or the performance considerations are such that Reflection is good enough. However, if you’re running a lot of Reflection code, then replacing it with type-specific code generated at runtime can give you a big performance boost. You’ll need to figure out how to trade off difficulty (and hence maintainability) against performance, but I hope this article gives you some idea of what the options are and what tradeoffs you’re making with each one. Have fun!

Tagged as Visual Studio

7 Responses to “Reflection, performance and runtime code generation”

Archives

Join our mailer

You should join our newsletter! Sent monthly:

Back to Top