Notes
2026/03/21

Staged programming and the principle of least power

I've recently been thinking about how to design a bytecode virtual machine that uses the same flat bytecode representation for both its input (so far, so normal) and also its output representation (with data structures represented as a flat bytecode tree). Why? Because going from bytecode to bytecode makes it possible to do staged programming, where a program's output is again an executable program that can be evaluated by the VM.

From a concrete implementation perspective, this is interesting for compiling away the overhead of macros. This could be done with a binary split into comptime evaluation and runtime evaluation. Using the same bytecode format for input and output is then merely a way to use the same VM for both.

The principle of least power

But once we use the same format for input and output, why stop at just two stages? Why not go for an arbitrary number of stages at comptime and then leave the last stage for runtime? Or maybe even multiple stages at comptime, and then multiple stages at runtime, similar to just-in-time compiling code at runtime?

Why go with just two stages, when we could have more power? As tempting as that sounds, remember the principle of least power:

The choice of language is a common design choice. The low power end of the scale is typically simpler to design, implement and use, but the high power end of the scale has all the attraction of being an open-ended hook into which anything can be placed: a door to uses bounded only by the imagination of the programmer.

As our friendly neighborhood language designer likes to remind us, more power never comes without tradeoffs. Why not use late binding and dynamic dispatch everywhere? Why not use multi-shot coroutines instead of boring one-shot coroutines? (Why not use lambda calculus for everything, where you can do anything. Anything at all. The only limit is yourself.) Because more powerful languages are both harder to implement and use, because they provide fewer guarantees and affordances. Limiting the power of a language increases our power of reasoning about the language and its programs.

Symbiotic programming, staged programming

What the principle of least power doesn't talk about, however, is that in practice most languages grow and accumulate power, because having a powerful language is really useful when you just want to ship this one feature at 10pm on a Friday night and yes I do want my language to be a Turing-complete mess and no I don't care about the philosophical implications of it thank you very much. At the end of the day, power is useful, and power is very hard to take away once you've tasted it. Exhibit A: modern (Turing-complete) CSS. Exhibit B: modern JavaScript.

There seems to be only one alternative to accepting the inevitable and letting a language's power grow: Some languages manage to remain constrained in power by providing an escape hatch to reach for a different, more powerful language where occasionally necessary. Examples are embedded languages that rely on a host environment or languages with an “unsafe” superset of the language, like Rust.

This is what makes staged programming interesting: Having an arbitrary number of program stages makes it possible to have an arbitrary number of language dialects with different power, where each stage does just what it has to, but not more. Instead of paying for late binding everywhere in the language, we often only need it in a few isolated places and can evaluate it away, at compile time, everywhere else. Similarly, we might happily pay the performance cost of powerful features such as late binding or multi-shot coroutines for code running at compile time, as long as we know that all of the resulting runtime code will be free of these constructs. Viewed from this angle, a dynamically typed escape hatch like “any” defers parts of evaluation (like type checks) to a later stage at runtime, giving us a more powerful language, at the cost of fewer guarantees.

This even applies to multiple runtime stages: Let's say we write a program that accepts a regex and a file as an input, then checks whether the contents of the file match the regex. With staged programming, we could first evaluate the regex at runtime and only then (in a second runtime stage) run the specialized bytecode on the file, which will often be faster than interpreting the regex ad-hoc. (This is effectively just partial evaluation / program specialization / JIT compilation.)

Stages and power

How do different stages correlate with power? Are early stages more or less powerful than later ones? A virtual machine could expose different capabilities at different stages, for example allowing network access either only at comptime or only at runtime, with good arguments for either choice. (Maybe we want comptime evaluation to be deterministic and don't allow network access at comptime? Or maybe we want network access at comptime specifically so that we can pull in dependencies, but want the resulting runtime program to not make network requests?)

Typechecking allows us to write down assumptions about our program in a less powerful, more restrictive language at comptime, so that we can run a more powerful language at runtime. Metaprogramming and unrestricted macros allow us to specify our program in a more powerful language at comptime, so that we can run a more restrictive language without reflective or metaprogramming capabilities at runtime.

Comptime isn't necessarily more or less powerful than runtime, and neither are different comptime stages, nor different runtime stages. It is sometimes useful to have a more powerful language operate on and return a less powerful language, and sometimes the opposite. After all, why should either comptime or runtime be tied to more or less power? They are just different times for running computation (and potentially causing side effects). But the point is that they are different points in time, allowing us to distinguish between more and less powerful phases in our program lifecycle, instead of having to pick the right level of power once and for all. Who wouldn't pick the more powerful language in such a scenario?

The main benefit of staged programming is that it allows us to choose where and how we need more power, so that the remaining stages can operate with fewer assumptions and in a less powerful environment.