in which a compiler takes steps towards strapping its boots

One of the biggest milestones in a programming language is when the language gets to the point where it can be used to write its own implementation, which is called self-hosting. This is seen as a sign of maturity since reaching this point requires getting a lot of common problems shaken out first.

The compiler for the Fennel programming language was written using Lua, and it emits Lua code as output. Over time, certain parts of the compiler were added that were written in Fennel, starting with fennelview, which is the pretty-printer for Fennel data structures. Once the macro system stabilized, many built-in forms that had originally been hard-coded into the compiler using Lua got ported to the macro system. After that the REPL was ported to Fennel as a relatively independent piece of code, followed by the command-line launcher script and a helper module to explain and identify compiler errors. The parser had already seen an impressive port to Fennel using a literate programming approach, but we hadn't incorporated this into the mainline repository yet because the literate approach made it a bit tricky to bring in.

As you might expect, any attempt at self-hosting can easily run into "chicken or egg" problems—how do you use the language to write the implementation if the language hasn't been finished being defined yet? Sometimes this requires simply limiting yourself to a subset; for instance, the built-in macros in Fennel cannot themselves use any macros but must be written in a macroless subset of Fennel. In other cases, such as the launcher, we keep a copy of the old pre-self-hosted version around in order to build the new version.

lake union/lake washington canal

That's about as far as we could get on the path to self-hosting without changing the approach, because most of the remaining code was fairly entangled, and we didn't have clear boundaries to port it one piece at a time. At this stage there were 2250 lines of Lua and 1113 lines of Fennel. I recently took some time to reorganize the compiler into four independent "pseudo-modules" with clear dependencies between the pieces. But even with the independent modules broken out, we were still looking at porting 800 lines of intricate compiler code and 900 lines of special forms all in two fell swoops.

That's when I started to consider an alternate approach. The Fennel compiler takes Fennel code as input and produces Lua code as output. We have a big pile of Lua code in the compiler that we want turned into Fennel code. What if we could reverse the process? That's when Antifennel was born.

(fn compile [ast tail?]
  (when (os.getenv "DEBUG") (print ast.kind))
  (match ast.kind
    "Chunk" (chunk (map ast.body compile true)) ; top-level container of exprs
    "LocalDeclaration" (local-declaration compile ast)
    "FunctionDeclaration" (declare-function compile ast)

    "FunctionExpression" (function compile ast)
    "BinaryExpression" (binary compile ast)
    "ConcatenateExpression" (concat compile ast)
    "CallExpression" (call compile ast)
    "LogicalExpression" (binary compile ast)
    "AssignmentExpression" (assignment compile ast)
    "SendExpression" (send compile ast)
    "MemberExpression" (member compile ast)
    "UnaryExpression" (unary compile ast)
    "ExpressionStatement" (compile ast.expression)

    "IfStatement" (if* compile ast tail?)
    "DoStatement" (do* compile ast tail?)
    "ForInStatement" (each* compile ast)
    "WhileStatement" (while* compile ast)
    "RepeatStatement" (repeat* compile ast)
    "ForStatement" (for* compile ast)
    "BreakStatement" (break compile ast)
    "ReturnStatement" (if tail?
                          (vals compile ast)
                          (early-return compile ast))

    "Identifier" (sym ast.name)
    "Table" (table* compile ast)
    "Literal" (if (= nil ast.value) (sym :nil) ast.value)
    "Vararg" (sym "...")
    nil (sym :nil)

    _ (unsupported ast)))

Antifennel takes Lua code and parses[1] it, then walks the abstract syntax tree of Lua and builds up an abstract syntax tree of Fennel code based on it. I had to add some features to fnlfmt, the formatter for Fennel, in order to get the output to look decent, but the overall approach is overall rather straightforward since Fennel and Lua have a great deal of overlap in their semantics.

The main difficulties came from supporting features which are present in the Lua language but not in Fennel. Fennel omits somethings which are normal in Lua, usually because the code becomes easier to understand if you can guarantee certain things never happen. For instance, when you read a Fennel function, you don't have to think about where in the code the possible return values can be found; these can only occur in tail positions because there is no early return. But Lua allows you to return (almost) anywhere in the function!

Fennel has one "secret" feature to help with this: the lua special form:

(lua "return nextState, value")

Included specifically to make the task of porting existing code easier, the lua form allows you to emit Lua code directly without the compiler checking its validity. This is an "escape hatch" that can allow you to port Lua code as literally as possible first, then come back once you have it working and clean up the ugly bits once you have tests and things in place. It's not pretty, but it's a practical compromise that can help you get things done.

Unfortunately it's not quite as simple as just calling (lua "return x"), because if you put this in the output every time there's a return in the Lua code, most of it will be in the tail position. But Fennel doesn't understand that the lua call is actually a return value; it thinks that it's just a side-effect, and it will helpfully insert a return nil after it for consistency. In order to solve this I needed to track which returns occurred in the tail position and which were early returns, so I could use normal Fennel methods for the tail ones and use this workaround hack only for early returns[2]. But that ended up being easier than it sounds.

Other incompatibilities were the lack of a break form (which could easily be addressed with the (lua "break") hack because it only happens in a non-tail position), the lack of repeat form (compiled into a while with a break at the end), and the fact that locals default to being immutable in Fennel and mutability is opt-in. This last one I am currently handling by emitting all locals as var regardless of whether they are mutated or not, but I plan on adding tracking in to allow the compiler to emit the appropriate declaration based on how it's used.

While it's still too early to swap out the canonical implementation of the Fennel compiler, the Antifennel-compiled version works remarkably well, passing the entire language test suite across every supported version of the Lua runtime at 79% the length of the Lua version. I'm looking forward to finishing the job and making the Fennel codebase written purely using Fennel itself.


[1] Antifennel uses the parser from the LuaJIT Language Toolkit, which is another self-hosted compiler that takes Lua code as input and emits LuaJIT bytecode without requiring any C code to be involved. (Of course, in order to run the bytecode, you have to use the full LuaJIT VM, which is mostly written in C.) I had to make one small change to the parser in order to help it "mangle" identifiers that were found to conflict with built-in special forms and macros in Fennel, but other than that it worked great with no changes. The first big test of Antifennel was making sure it could compile its own parser dependency from Lua into Fennel, which it could do on the second day.

[2] Even that is a slight oversimplification, because the lua return hack only works on literals and identifiers, not complex expressions. When a complex expression is detected being returned, we compile it to a wrapping let expression and only pass in the bound local name to the return.

« older | 2020-06-30 08:30:17