Literate output

Constructing a compiler

There is a lot of tutorials out there on how to write a simple interpreter or a compiler. There is most certainly no need in just another one of this kind. This tutorial is the opposite, it tells how to write a complex compiler with all the bells and whistles, while still leaving the unimportant details out and keeping things easy to understand.

As a source language we're using a very simple, somewhat functional language. It's eager, expression--oriented, no explicit destructive assignment to variables, functions may have I/O and memory side effects (using some impure intrinsics). No explicit looping constructs besides recursion, tail recursion optimisation is guaranteed by the compiler.

We build a compiler on top of this simple language, and then extend the core language with more features added as syntax extensions and compile-time macros, demonstrating a number of different important approaches used together.

There is no intention to write just another one short and dense compiler. We're fully embracing the nanopass-style of compiler construction, therefore code is deliberately verbose, but on the other hand, every step made in the compilation pipeline is easy to understand and easy to reason about.

This compiler takes its eager functional source language and reduces it to an imperative low level intermediate representation, so both functional and imperative compilation techniques are covered here, with more emphasis on the imperative side (e.g., we do not go the CPS way, using an SSA representation instead).

Naming languages is hard and does not make much sense, so we'll simply call our source language lang0, with all the consequent IR names derived from this notation.

Topics covered in this tutorial include (but not limited to):

ASTs
Parsing
Macro expansion
Syntax sugar elimination
Lexical scope handling
Lambda lifting
Tail calls handling, tail recursion optimisation
Converting an expression-oriented language to statements + expressions
Producing a flat SSA IR
Converting to the abstract SSA IR and back

This is also a showcase for a range of new MBase features - recform ASTs support, abstract SSA library, AST pretty printing, etc.

Initial AST

Language starts with a definition of its Abstract Syntax Tree. This is the very first AST in our compilation chain, it is generated by the parser frontend directly and contains some syntax sugar that ought to be eliminated straight away.

Since this is a "functional" (sort of) language, everything is an expression. We will split statements and expressions further down the compilation pipeline.

ast lang0src recform {
  // Toplevel statements - definitions and raw expressions
  top = define(ident:id, expr:e)
      | defmacro(ident:id, expr:e)
      | expr(expr:e)
      | debug(.*any:es)
      ;
  // All possible expressions
  expr = lambda(*ident:args, expr:body)
       | seq(.*expr:body)

       | const(const:c)
       | var(ident:id)
       | apply(expr:fn, .*expr:args)
       | apply2(vident:fn, expr:L, expr:R)

       | if3(expr:cnd, expr:tr, expr:fl)
       | if2(expr:cnd, expr:tr)
       | let(*letpair:ps, expr:body)

       | mklist(*expr:es)

       // This one does not come from a parser, but is
       // produced by a lexical scoping pass later on.
       | elambda(*ident:args, lenv:benv, lenv:fenv, expr:body)

       // And the following is for macros
       | toplift(top:t)
       | quasiquote(expr:e)
       | unquote(expr:e)   // only valid inside a quasiquote expression
       ;
  // All possible constant literals
  const = number(any:n)
        | string(any:s)
        | symbol(any:s)
        | nil()
        ;
  // A let binding node
  letpair = p(ident:id, expr:v);

  ident = unquote(expr:e)
        | v(vident:id)
        ;
}

Very often we will have to define a new AST which is only slighly different from another one. It does not make sense to copy-paste the same thing over and over again. Instead, new ASTs should be rewritten from the existing ones. The language after macro expansion will be the following:

ast lang0 : lang0src ( ident → oident ) recform {
  top += -defmacro;
  expr += -toplift
       |  -quasiquote
       |  -unquote
       ;
}

Parser

As we have defined the structure of this language, we can start building a parser for it.

Parsers in PFront are based on an extended PEG. Normally, a new language parser will inherit one of the existing parsers, to avoid defining common stuff like whitespace, comments, identifiers, number and string literals and so on. In this case we're inheriting PFront language entirely, though only use few primitive nodes from it.

Let's not dwell on it, ok? Far too many compiler tutorials are stuck on parsing, and parsing is the least important part of a language implementation.

parser plang0 (pfront) {
  // A target AST produced by this parser
  target lang0src;
  // Which node should be used as a whitespace - inherited from PFront
  !!Spaces;

  // Rules - for syntax highlighting and for separating keywords
  // from identifiers
  [lexical:] ⇐ [lexical] ⇒ {ctoken = lexic};
  [keyword:] ⇐ [keyword] ![IdentRest] ⇒ {ctoken = keyword};

  // Top entry node - returns a list of lang0 toplevel statements.
  // Note the trailing whitespaces - it is important to add them to a
  // top node, since the whitespaces are only implicitly ignored at
  // beginnings of tokens.
  plang0 ⇐ eslist<[l0topexpr]>:ts [Spaces]* ⇒ ts;

  // A single toplevel statement, including standalone expressions
  l0topexpr >> top
     ⇐ { define [l0qident]:name "=" [l0expr]:e ";"? ⇒ define(name, e) }
     /  { function [l0qident]:name "(" ecslist<[l0qident],",">:args ")" [l0expr]:b ⇒
                            define(name, lambda(args, b)) }
     /  { macro [l0qident]:name "(" ecslist<[l0qident],",">:args ")" [l0expr]:b ⇒
                            defmacro(name, lambda(args, b)) }
     /  { [l0expr]:e ⇒ expr(e) }
     ;
  // Binary operators classified by their priority
  p0binop10 ⇐ {"*" ⇒ `binopmul } / {"/" ⇒ `binopdiv };
  p0binop9x  ⇐ {"&&" ⇒ `binopand } / {"&"  ⇒ `binopbinand }
             /  {"||"  ⇒ `binopor } / {"|"  ⇒ `binopbinor };
  p0binop9  ⇐ {"<<" ⇒ `binopshl } / {">>" ⇒ `binopshr };
  p0binop8  ⇐ {"+" ⇒ `binopadd } / {"-" ⇒ `binopsub };
  p0binop7  ⇐ {"<=" ⇒ `binople } / {">=" ⇒ `binopge }
            /  {"==" ⇒ `binopeq } / {"!=" ⇒ `binopneq }
            /  {"<" ⇒ `binoplt } / {">" ⇒ `binopgt };
  // Expressions, starting with a binary. This is compiled as a Pratt parser embedded
  // into a PEG.
  binary l0expr >> expr
                ⇐ (1000) [l0expr] [p0binop10] [l0expr] ⇒ apply2(op, L, R)
                |   (950) [l0expr] [p0binop9x] [l0expr] ⇒ apply2(op, L, R)
                |   (900) [l0expr] [p0binop9]  [l0expr] ⇒ apply2(op, L, R)
                |   (800) [l0expr] [p0binop8]  [l0expr] ⇒ apply2(op, L, R)
                |   (700) [l0expr] [p0binop7]  [l0expr] ⇒ apply2(op, L, R)
                |   (600) [l0expr] "::"  [l0expr] ⇒ apply2(`cons, L, R)
                |   (500) [l0expr] "@"  [l0expr] ⇒ apply2(`append, L, R)
                |   [l0expr1]
                ;
  // Simple expressions
  l0expr1 ⇐
             { "(" [l0expr]:e ")" "(" ecslist<[l0expr],",">:args ")"
                    ⇒ apply(e,@args) }
          /  { "(" [l0expr]:e ")" ⇒ e }
          /  { "{" cslist<[l0expr], ";">:es ";"? "}" ⇒ seq(@es) }
          /  { "`" [l0expr]:e "`" ⇒ quasiquote(e) }
          /  { "::expr" "\" [l0expr]:e "\" ⇒ unquote(e) }
          /  { "::lift" [l0topexpr]:t ⇒ toplift(t) }
          /  { if "(" [l0expr]:cnd ")" [l0expr]:tr else [l0expr]:fl
                    ⇒ if3(cnd, tr, fl) }
          /  { if "(" [l0expr]:cnd ")" [l0expr]:tr
                    ⇒ if2(cnd, tr, fl) }
          /  { fun "(" ecslist<[l0qident],",">:args ")" [l0expr]:b
                    ⇒ lambda(args, b) }
          /  { let cslist<[l0letpair],",">:ps in [l0expr]:b ⇒ let(ps, b) }
          /  { return [l0expr]:e ⇒ e }
          /  { [number]:n ⇒ const(number(n)) }
          /  { [string]:s ⇒ const(string(s)) }
          /  { "[" cslist<[l0expr],";">:es "]" ⇒ mklist(es) }
          /  { "[" "]" ⇒ const(nil()) }
          /  { "'" [l0qident]:s ⇒ const(symbol(s)) }
          /  { [l0qident]:id "(" ecslist<[l0expr],",">:args ")"
                    ⇒ apply(var(id),@args) }
          /  { [l0qident]:id ⇒ var(id) }
          ;
  // A single let binding node
  l0letpair >> letpair ⇐ [l0qident]:id "=" [l0expr]:e ⇒ p(id, e);

  // Potentially unquoted identifiers
  l0qident ⇐ { "\" [l0expr]:e "\" ⇒ unquote(e) }
           /  { [ident]:id ⇒ v(id) }
           ;
}

An example of a language accepted by this parser would look like this:

/* Comment */
function foldl(fn, i, l)
{
   if(nullp(l)) return i
   else let hd = head(l),
            tl = tail(l)
         in foldl(fn, fn(i, hd), tl)
}

// One-line comment
print(foldl(fun(a,b) {a+b}, 0, [1;2;3;4;5;6]))

We will use this foldl function to illustrate the compilation pipeline.

There is nothing interesting to see there yet. Leaving it here just for a reference - this function straight after parsing:

(define (v foldl)
  (lambda ((v fn) (v i) (v l))
    (seq (if3 (apply (var (v nullp)) (var (v l))) (var (v i))
        (let ((p (v hd) (apply (var (v head)) (var (v l))))
              (p (v tl) (apply (var (v tail)) (var (v l)))))
          (apply (var (v foldl)) (var (v fn))
            (apply (var (v fn)) (var (v i)) (var (v hd))) (var (v tl))))))))

Macro expansion

Logically, this is the next step straight after parsing. Though, it is a relatively advanced subject that probably should be skipped at first, straight to the syntax sugar lowering. Feel free to come back and review this section after reading about the runtime-compiler feedback loop. At this moment we don't have any feedback loop, it can only be defined later, when the rest of the compiler pipeline is finalised. We'll leave an empty stub here meanwhile.

define l0_env_feedback_hook =
  mkref(λ(globenv, t) ∅)

function l0_env_feedback_loop(globenv, t)
  (^l0_env_feedback_hook)(globenv, t)

Unfortunately, macros imply a very eary exposure to the concept of environment, which, again, we'd prefer to leave for much later down the pipeline. We'll keep a dummy stub here instead.

define l0_env_find_macro_hook =
mkref(λ(globenv, id) ∅)

function l0_env_find_macro(globenv, fn) {
  getid(fn) =
    visit:lang0src /rec/ (expr: fn) {
       deep expr {
          var → idvar(id)
        | else → ∅};
       deep ident {
          v → idv(id) | else → ∅}};
  (^l0_env_find_macro_hook)(globenv, getid(fn))}

In order to be able to implement macros we need an access to AST constructing functions. The easiest way of abstracting them out nicely is to use quasiquotation: any AST inside the quasiquotation is translated into an AST-constructing code that calls the appropriate runtime library functions.

Expression expansion pass includes handling the quasiquote nodes (consider quasiquote a special kind of a "macro") and user-defined macros applied via the compiler feedback loop.

function l0_macro_expand_expr(globenv, addtop, e)
  visit:lang0src /rec/ (expr:e) {
     once expr {
         quasiquote → l0_compile_quasiquote(λ(e0) l0_macro_expand_expr(globenv, addtop, e0), equasiquote(e))
       | unquote → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('MISPLACED-UNQUOTE'())
       | apply   → {
            ismacro = l0_env_find_macro(globenv, fnapply(fn,args));
            if (ismacro) {
              nxt = ismacro(globenv, node());
              l0_macro_expand_expr(globenv, addtop, nxt)
            } else
              mk:nodelang0src:expr:apply(fn,args)(fn = l0_macro_expand_expr(globenv, addtop, fnapply(fn,args)),
                      args = map a in argsapply(fn,args) do l0_macro_expand_expr(globenv, addtop, a))
         }
       | deep → { toplift → { addtop(node()); mk:seqlang0src:expr:seq(body)(body=∅) }
                 | else → node() }}}

function l0_macro_expand_top(globenv, addtop, t)
  visit:lang0src /rec/ (top:t) {
     once expr { else →
       l0_macro_expand_expr(globenv, addtop, node()) }}

function l0_macro_expand(globenv, ts)
collector (addtop, gettop) {
   iter t in ts do
     visit:lang0src /rec/ (top: t) {
        once top {
           defmacro → l0_env_feedback_loop(globenv, node())
         | else → addtop(l0_macro_expand_top(globenv, addtop, node()))}};
   return gettop()}

Cleanup

Now all the quasiquotations, macro applications and top lift nodes are eliminated and we can turn our AST into a form suitable for the rest of the compiler pipeline.

function l0_postexpand_sanitise(ts)
  map t in ts do
     visit:lang0src /rec, dst lang0/ (top: t) {
        deep expr {
           toplift → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())
         | quasiquote → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())
         | unquote → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())
         | else → node()};
        deep ident {
           unquote → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())
         | v → idv(id)}}

For the compiler pipeline demo purposes we need a dummy environment:

function l0_expand_dummy(ts)
l0_postexpand_sanitise(
l0_macro_expand(mkhash(), ts))

Lowering

Our first AST is designed for parsing, but it is not very useful for any further analysis, optimisations and a code generation. Before we start actually compiling anything we have to eliminate some syntax sugar and make this AST simpler.

Eliminating syntax sugar

The two argument application node was only introduced to simplify binary expressions parsing and must go.

This is our first introduction to visitors - a fundamental building block in PFront. Visitor walks over an AST in a defined order and builds a new AST, transforming certain nodes with given expressions. In this case we only rewrite apply2 variant of an expr node, with all the boilerplate covering all possible paths to expr nodes being inferred automatically.

Variant constructors like mk:apply are context-dependent, in this visitor we know that the resulting AST is the same as the original AST, so it constructs lang0:expr:apply variants.

This function takes a list of toplevel statements as an argument, and returns a list of transformed toplevel statements.

function l0_eliminate_apply2(ts)
// 'ts' is a list of top statements,
//   we'll visit them one by one
map t in ts do
  // A visitor block:
  //  *  The source AST is 'lang0', destination is the same
  //  *  It's a recform AST (not an old list-based one)
  //  *  The entry point is a 'top' node
  visit:lang0 /rec/ (top: t) {
     // All the 'expr' nodes are visited in a depth-first order
     deep expr {
        // All the 'apply2' variants of 'expr' are rewritten
        //  as 'apply' variants
        apply2 → mk:applylang0:expr:apply(fn,args)(fn = mk:varlang0:expr:var(id)(fnapply2(fn,L,R)), args = [Lapply2(fn,L,R);Rapply2(fn,L,R)])
        // All the other variants will remain the same
      | else → node()}}

A two-armed if does not make sense in an expression--based language and it must go. An implicit false branch returns a nil constant.

function l0_eliminate_if2(ts)
map t in ts do
  visit:lang0 /rec/ (top:t) {
     deep expr {
        if2 → mk:if3lang0:expr:if3(cnd,tr,fl)
   with implicit arguments: cnd, tr
(fl = mk:constlang0:expr:const(c)(mk:nillang0:const:nil()()))
      | else → node()}}

And a list constructor is just a sequence of nested cons applications. We do not care if it's entirely constant at this stage, all the optimisations can be done at a later stage.

function l0_eliminate_list(ts)
  map t in ts do
    visit:lang0 /rec/ (top:t) {
      deep expr {
        mklist → do loop(e = esmklist(es)) {
                   match e with
                      hd : tl → mk:applylang0:expr:apply(fn,args)(mk:varlang0:expr:var(id)('cons'), [hd; loop(tl)])
                    | ∅ → mk:constlang0:expr:const(c)(mk:nillang0:const:nil()())}
      | else → node()}}

And everything together now:

function l0_lowering(ts)
   l0_eliminate_list(
   l0_eliminate_if2(
   l0_eliminate_apply2(
   l0_expand_dummy(ts))))

Lexical scope

This is a lexically scoped language, allowing new definitions to override an outer scope. This is a default design decision for languages with pattern matching, for example. Our core language does not have any pattern matching support, but it can be added later with macros and syntax extensions.

The simplest approach to lexical scope implementation is to just rename all the variables to make them unique. In a consequent pass we'll collect variable and argument declarations and sort everything into categories - global variables, local variables and lambda arguments. Another category to appear later (after a lambda lifting pass) is a closure variable.

It would be nice to keep a track of all the renames we did, it could help with producing debug information and with giving error messages more context.

function l0_lexical_scope(ts)
collector(add, get) {
   ret = map t in ts do
    visit:lang0 /rec/ (top:t) {
       once expr { else → l0_lexical_scope_expr(node(), add) }};
   with target (lang0) {
     mk:top:debuglang0:top:debug(es)('renames'(@map append g in get() do g)) : ret
   }}

Now it's time to stop using the initial AST and move on. We eliminated few variants, and are going to introduce more specific variable variants: global, local, argument or a closure variable. We're also introducing variants for a consequent lambda lifting.

The new AST is derived from the previous one (lang0), with few modifications made to the expr node.

ast lang0i : lang0 () recform {
  expr +=
      global(ident:id)
    | local(ident:id)
    | arg(ident:id)
    | clvar(ident:id)
    | mkclosure(ident:df, *expr:args)
    | clambda(*ident:clenv, *ident:args, expr:body)
    | -var
    | -apply2
    | -if2
    | -mklist
    | -lambda
    ;}

Now we can make our variables more specific, and since all the names are unique now we can do it easily in a depth--first order. Of course we could do it in the previous pass, but it's better to keep passes as simple as possible.

function l0_classify_vars(ts)
map t in ts do
l0_classify_vars_top(t)

Lambda lifting

Lambda lifting pass will turn all the nested lambda functions into toplevel definitions, capturing closure environments where necessary. We did the lexical scoping pass previously, and now all the names inside any given context are unique, which makes it easy to produce lists of externally defined variables used under each lambda context.

First we have to propagate the free variables lists (with bound variables already collected in the lexical scoping pass), from top to bottom.

function l0_lambda_lift_freevars(e) {
  // A helper function: returns a list of all
  // identifiers used in a given expression (bound or not)
  collectvars(e) = collector(addv, getvs) {
     visit:lang0i /rec/ (expr: e) {
        once expr {
           elambda → {iter v in fenvelambda(args,benv,fenv,body) do addv(v)}
         | deep → {
              local → addv(idlocal(id))
            | arg → addv(idarg(id))
            | else → ∅}}};
     return getvs()};
  // Add free variable lists to all the lambda nodes,
  // in a depth--first order.
  visit:lang0i /rec/ (expr: e) {
     deep expr {
        elambda → mk:nodelang0i:expr:elambda(args,benv,fenv,body)
   with implicit arguments: args, benv, body
(fenv = collectvars(bodyelambda(args,benv,fenv,body)))
      | else → node()}}}

Now we can use an intersection between a bound variables list and a free variable list to form a closure environment.

Obviously we do not need to lift lambda expressions that are already on top, so we have to make an exception for this case.

function l0_lambda_lift_top(t, adddef) {
  liftexpr(e) = l0_lambda_lift_expr(l0_lambda_lift_freevars(e), ∅, adddef);
  visit:lang0i /rec/ (top: t) {
     once expr { elambda → mk:clambdalang0i:expr:clambda(clenv,args,body)(clenv=∅, args = argselambda(args,benv,fenv,body), body = liftexpr(bodyelambda(args,benv,fenv,body)))
               | else → liftexpr(node()) }}}

Lambda--lifting a single top level statement may produce any number of new top level statements, so we'll just collect all the new statements together into one flat list.

function l0_lambda_lift(ts) {
  collector(adddef, getdefs) {
     iter t in ts do
        adddef(l0_lambda_lift_top(t, adddef));
     return getdefs()}}

And now, all the lexical scope and lambda lifting passes together:

function l0_lexical(ts)
   l0_lambda_lift(
   l0_classify_vars(
   l0_lexical_scope(ts)))

Preparing for a code generation

By now we moved all the lambdas into top level definitions, made closure environments explicit, eliminated lexical scope and reduced some syntax sugar.

Let's refine our AST further. We can classify top level statements as functions (i.e., lambdas with an empty closure environment), closures, constants, evaluated definitions and, finally, toplevel expressions.

We also eliminated the temporary elambda variant previously, and are going to get rid of nested let bindings and replace them with flat imperative assignments. Same AST is introducing a special tail call variant.

Another variant introduced here is splitexpr, it is required for separating expressions and statements further down the pipeline.

ast lang0j : lang0i () recform {
  top = define(ident:id, expr:e)
      | defconst(ident:id, expr:e)
      | sfunction(ident:id, *ident:args, expr:body)
      | closure(ident:id, *ident:clargs, *ident:args, expr:body)
      | expr(expr:e)
      | debug(.*any:es)
      ;
  expr += -elambda
       |  -clambda
       |  alloca(ident:id)
       |  set(ident:id, expr:v)
       |  drop(expr:v)
       |  funref(ident:id)
       |  tailapply(expr:fn, .*expr:args)
       |  return(expr:v)
       |  splitexpr(expr:l, expr:r)
       ;}

Removing toplevel lambdas

%"lang0i:*top -> lang0j:*top"
function l0_classify_functions(ts)
map t in ts do
   visit:lang0i /rec, dst lang0j/ (top: t) {
      deep top {
         define → edefine(id,e)(iddefine(id,e))
       | expr → eexpr(e)(∅)
       | debug → node()};
      once expr(dst) {
         clambda → if (clenvclambda(clenv,args,body))
                         mk:top:closurelang0j:top:closure(id,clargs,args,body)(id = dst,
                                        clargs = clenvclambda(clenv,args,body),
                                        args = argsclambda(clenv,args,body), body = bodyclambda(clenv,args,body))
                    else mk:top:sfunctionlang0j:top:sfunction(id,args,body)(id = dst, args = argsclambda(clenv,args,body), body = bodyclambda(clenv,args,body))
      |  const → if (dst) mk:top:defconstlang0j:top:defconst(id,e)(id = dst, e = node())
                  else     mk:top:exprlang0j:top:expr(e)(mk:constlang0j:expr:const(c)(mk:nillang0j:const:nil()()))
      |  else  → if (dst) mk:top:definelang0j:top:define(id,e)(id = dst, e = node())
                  else     mk:top:exprlang0j:top:expr(e)(node())}}

function l0_classify_calls(ts)
  map t in ts do
     visit:lang0j /rec/ (top: t) {
        deep expr {
           mkclosure → if (argsmkclosure(df,args)) node()
                        else      mk:funreflang0j:expr:funref(id)(dfmkclosure(df,args))
        |  else → node()}}

Flattening bindings

Instead of our nice and clean functional variable binding we'll produce an imperative sequence of destructive assignments here. Some other passes will also introduce true destructive assignments, e.g., lifting the statement expressions and optimising direct tail recursion, so we're not losing any purity here. A consequent SSA transform will make our IR clean again.

%"lang0j:expr -> lang0j:expr"
function l0_flatten_lets_topexpr(e)
collector(addalloca, getallocas)
with target (lang0j) {
   ret = visit:lang0j /rec/ (expr: e) {
      deep expr {
         let → mk:seqlang0j:expr:seq(body)(body = pslet(ps,body)⊕[bodylet(ps,body)])
       | else → node()};
      deep letpair {
         p → {
            addalloca(mk:expr:allocalang0j:expr:alloca(id)(idp(id,v)));
            mk:expr:setlang0j:expr:set(id,v)(idp(id,v), vp(id,v))}}};
   return mk:expr:seqlang0j:expr:seq(body)(body = getallocas() ⊕ [ret])}

function l0_flatten_lets(ts)
  map t in ts do
    visit:lang0j /rec/ (top:t) {
       once expr { else → l0_flatten_lets_topexpr(node()) }}

Redundant nested seq

There is no reason for doing this besides pure aesthetics. If you want to dump IR somewhere in the middle, deeply nested seq may be quite annoying.

function l0_seq_tidy_topexpr(e)
   do loop(e = e) {
      nested(e) =
        do nestedloop(e = e)
         visit:lang0j /rec/ (expr:e) {
            once expr {
               seq → map append b in bodyseq(body) do nestedloop(b)
             | else → [loop(node())]}};
      entry(e) =
         visit:lang0j /rec/ (expr:e) {
            once expr {
               seq → {match bodyseq(body) with
                          [one] → loop(one)
                        | else → mk:nodelang0j:expr:seq(body)(body = map append b in bodyseq(body) do nested(b))}
            |  deep → { else → node() }}};
      entry(e)}

function l0_seq_tidy(ts)
  map t in ts do
     visit:lang0j /rec/ (top:t) {
       once expr { else → l0_seq_tidy_topexpr(node()) }}

Marking tail calls

At this stage, IR is suitable for the tail call analysis, which could have been more difficult previously and will be more clumsy later. Results of this marking will be used further down in a direct tail recursion optimisation and are required for the statement vs. expression split, since we're also introducing explicit return statements here.

function tsplit(l0)
  do iloop(l = l0, acc = ∅)
     match l with
       [hd] → [acc; hd]
     | ∅ → [∅;∅]
     | hd:tl → iloop(tl, acc ⊕ [hd])

function l0_tailcalls_topexpr(e0)
   do loop(e = e0)
     visit:lang0j /rec/ (expr: e) {
        once expr {
           seq → if(bodyseq(body)) {
                     <[a;b]> = tsplit(bodyseq(body));
                      mk:nodelang0j:expr:seq(body)(body = a ⊕ [loop(b)])}
                  else node()
        |  if3 → mk:nodelang0j:expr:if3(cnd,tr,fl)
   with implicit arguments: cnd
(tr = loop(trif3(cnd,tr,fl)), fl = loop(flif3(cnd,tr,fl)))
        |  apply → mk:tailapplylang0j:expr:tailapply(fn,args)(fn = fnapply(fn,args), args = argsapply(fn,args))
        |  else → mk:returnlang0j:expr:return(v)(node())}}

function l0_tailcalls(ts)
  map t in ts do
     visit:lang0j /rec/ (top:t) {
       once expr { else → l0_tailcalls_topexpr(node()) }}

Statement expressions

A procedure very similar to tail call marking, but for the statement expressions (like if and seq yielding value that is used inside another expression). We'll need it later on when we actually split statements and expressions.

The following function, just like a tail call marker pass before, will follow the returning branches of expressions only --- i.e., last entry in a sequence and both arms of an if, but instead of inserting a return or a tail call marker it will assign that value to a variable or wrap it into a drop marker, meaning that the result is going to be discarded anyway.

function l0_stmtexpr_inner(e0, dst)
   do loop(e = e0)
     visit:lang0j /rec/ (expr: e) {
        once expr {
           seq → if(bodyseq(body)) {
                     <[a;b]> = tsplit(bodyseq(body));
                      mk:nodelang0j:expr:seq(body)(body = a ⊕ [loop(b)])}
                  else node()
        |  if3 → mk:nodelang0j:expr:if3(cnd,tr,fl)
   with implicit arguments: cnd
(tr = loop(trif3(cnd,tr,fl)), fl = loop(flif3(cnd,tr,fl)))
        |  alloca → node()
        |  set → node()
        |  else → if (dst) mk:setlang0j:expr:set(id,v)(dst, node())
                   else mk:droplang0j:expr:drop(v)(node())}}

The next function is identifying statement--expressions and treats them in a special way: any value returned from the non--terminal sequence elements is dropped, and any value of a complex statement--expression that is actually used is assigned to a variable, and a special splitexpr variant is created, holding both the statement that generates the value and an expression to access this value. After the split between statements and expressions is completed those inner statements will be lifted to their nearest statement scopes.

function l0_stmtexpr_topexpr_inner(e0, addalloca)
  do loop(e = e0) {
   inner(e) =
     visit:lang0j /rec/ (expr:e) {
        once expr {
           seq → symbols(dst) {
                   addalloca(mk:allocalang0j:expr:alloca(id)(dst));
                   mk:splitexprlang0j:expr:splitexpr(l,r)(
                     l0_stmtexpr_inner(loop(node()), dst),
                     mk:locallang0j:expr:local(id)(dst))}
         | if3 → symbols(dst) {
                   addalloca(mk:allocalang0j:expr:alloca(id)(dst));
                   mk:splitexprlang0j:expr:splitexpr(l,r)(
                     l0_stmtexpr_inner(loop(node()), dst),
                     mk:locallang0j:expr:local(id)(dst))}
         | deep → { else → node() }}};
   toplev(e) =
     visit:lang0j /rec/ (expr:e) {
        once expr {
           seq → if (bodyseq(body)) {
                     <[a;b]> = tsplit(bodyseq(body));
                      mk:nodelang0j:expr:seq(body)(body = [@map a do l0_stmtexpr_inner(loop(a), ∅);
                                       loop(b)])
                  } else node()
         | if3 → mk:nodelang0j:expr:if3(cnd,tr,fl)(cnd = inner(cndif3(cnd,tr,fl)), tr = loop(trif3(cnd,tr,fl)), fl = loop(flif3(cnd,tr,fl)))
         | set → mk:nodelang0j:expr:set(id,v)
   with implicit arguments: id
(v = inner(vset(id,v)))
         | else → inner(node())}};
   toplev(e)}

The previous function could have created some new variables, so we have to add corresponding allocas to the beginning of the outer context.

function l0_stmtexpr_topexpr(e0)
  collector (addalloca, getallocas)
  with target (lang0j) {
    ret = l0_stmtexpr_topexpr_inner(e0, addalloca);
    allocas = getallocas();
    if(allocas)
      mk:expr:seqlang0j:expr:seq(body)(body = [@allocas;ret])
    else ret}

function l0_stmtexpr(ts)
  map t in ts do
     visit:lang0j /rec/ (top: t) {
        once expr { else → l0_stmtexpr_topexpr(node()) }}

Detecting tail recursion

Some tail calls are tail recursive, and this is a right moment to replace them with a goto. The language does not have gotos and labels? Not a problem at all, we'll make a new language now:

ast lang0k : lang0j () recform {
  expr += label(labident:id)
       |  goto(labident:id)
       | -let
       ;
  hint expr:alloca = newline;
  hint expr:set = newline;
  hint expr:seq = newline;
  hint expr:if3 = newline;
  hint expr:goto = newline;
  hint expr:label = newline;
  hint expr:return = newline;
}

If a function is directly tail recursive, we have to create temporary variables that will initially contain copies of the argument values and then will be used to pass arguments in a tail--recursive call. In this case all the argument references in the function body must be replaced with local variable references.

function l0_tailrec(ts)
  map t in ts do
     visit:lang0j /rec, dst lang0k/ (top: t) {
        once top {
           sfunction → mk:nodelang0k:top:sfunction(id,args,body)
   with implicit arguments: id, args
(body = l0_tailrec_topexpr(idsfunction(id,args,body), argssfunction(id,args,body), bodysfunction(id,args,body)))
         | deep → { else → node() }};
        once expr {
           // We still have to do this pass even if it's not a function,
           // because we're moving to another AST version here.
           else → l0_tailrec_topexpr('*dummy*', ∅, node())}}

All together

function l0_codegen_prep(ts)
  l0_tailrec(
  l0_stmtexpr(
  l0_tailcalls(
  l0_seq_tidy(
  l0_flatten_lets(
  l0_classify_calls(
  l0_classify_functions(ts)))))))

function l0_lowering_driver(src) {
  x1 = l0_codegen_prep(
       l0_lexical(
       l0_lowering(src)));
  return x1
}

At this point, the foldl function looks like this:

(sfunction foldl (Z541026 Z541027 Z541028)
  (seq
    (alloca Z541026)
    (alloca Z541027)
    (alloca Z541028)
    (set Z541026 (arg Z541026))
    (set Z541027 (arg Z541027))
    (set Z541028 (arg Z541028))
    (goto tailrecentry)
    (label tailrecentry)
    (seq
      (alloca Z541029)
      (alloca Z541030)
      (if3 (apply (global nullp) (local Z541028))
        (return (local Z541027))
        (seq
          (set Z541029 (apply (global head) (local Z541028)))
          (set Z541030 (apply (global tail) (local Z541028)))
          (seq
            (set Z541026 (local Z541026))
            (set Z541027 (apply (local Z541026) (local Z541027) (local Z541029)))
            (set Z541028 (local Z541030))
            (goto tailrecentry)))))))

Code generation: an imperative IR

By now our source functional language got transformed into an imperative one, with explicitly allocated variables, destructive assignments and properly marked tail calls. Nested lambda functions and lexical scope were eliminated. There is already a clear separation between statements and expressions that occured naturally during the previous passes. Goto, seq, drop, alloca, set, if, return and tailapply are statements, while everything else is an expression.

If our target was a stack machine, it would have been possible to keep an expression-based IR all the way down. Unfortunately, stack machines are not quite fit for optimisations and analysis, and we're not going down the CPS route either.

In an SSA-based IR we have to *assign* results to named registers. Control flow statements must be habdled differently from the expressions yielding results, and for this reason we're starting this split here.

It is an important compiler construction pattern that occurs even in the languages that had distinct statements from the beginning.

We can formalise this split now. The new AST lang0k defines everything that returns a value as an expresssion and everything control flow as a statement. An if3 statement does not return anything any longer, we ensured this by assigning its return value to a variable previously, in a preparation phase.

ast lang0l : lang0k () recform {
  top = define(ident:id, stmt:e)
      | defconst(ident:id, expr:e)
      | sfunction(ident:id, *ident:args, stmt:body)
      | closure(ident:id, *ident:clargs, *ident:args, stmt:body)
      | eval(stmt:e)
      | debug(.*any:es)
      ;
  expr =
        const(const:c)
      | global(ident:id)
      | local(ident:id)
      | arg(ident:id)
      | clvar(ident:id)
      | mkclosure(ident:df, *expr:args)
      | funref(ident:id)
      | apply(expr:fn, .*expr:args)
      | splitexpr(stmt:l, expr:r)
      ;
  stmt =
        seq(.*stmt:body)
      | if3(expr:cnd, stmt:tr, stmt:fl)
      | alloca(ident:id)
      | set(ident:id, expr:v)
      | tailapply(expr:fn, .*expr:args)
      | return(expr:v)
      | label(labident:id)
      | goto(labident:id)
      | drop(expr:v)
      ;
}

The following pass dows not change the tree structure (besides renaming the toplevel expr to eval), it merely sorts out the variants into their new nodes.

function l0_split_statements(ts)
  map t in ts do
    visit:lang0k /rec, dst lang0l/ (top: t) {
       deep top {
          expr → mk:top:evallang0l:top:eval(e)(eexpr(e))
        | else → node()};
       deep expr {
          seq → mk:stmt:seqlang0l:stmt:seq(body)()
        | if3 → mk:stmt:if3lang0l:stmt:if3(cnd,tr,fl)()
        | alloca → mk:stmt:allocalang0l:stmt:alloca(id)()
        | set → mk:stmt:setlang0l:stmt:set(id,v)()
        | tailapply → mk:stmt:tailapplylang0l:stmt:tailapply(fn,args)()
        | return → mk:stmt:returnlang0l:stmt:return(v)()
        | label → mk:stmt:labellang0l:stmt:label(id)()
        | goto → mk:stmt:gotolang0l:stmt:goto(id)()
        | drop → mk:stmt:droplang0l:stmt:drop(v)()

        | else → node()}}

After this pass we still have that splitexpr thing in our IR, and it would be nice to flatten it before we go any further.

function l0_lower_splitexprs(ts)
   map t in ts do
     visit:lang0l /rec/ (top: t) {
        once stmt { else → l0_lower_splitexpr_stmt(node())}}

Flat IR

With a clean and simple low level imperative IR we can now turn to all the different set of compiler construction tools. First we'll lower this IR into an SSA form and apply all the standard SSA-based torture techniques to it.

The first step in flattening will produce almost the same IR, with all the complex expressions broken down to register assignments. We could have used local variable assignments here, but since we're going into an SSA soon anyway it does not matter if "registers" are introduced a bit earlier.

Note this lock node here - it is merely an optimisation, to ensure the visitor is not going into an already flattened code.

ast lang0flat0 : lang0l () recform {
  stmt += def(ident:id, expr:v);
  expr += reg(ident:id)
       |  load(ident:id)
       |  lock(expr:e);}

Expresison flattening

Getting ready to descend to SSA. This involves adding explicit load and store instructions for the local stack--allocated variables, so we'll lift local references alongside with compound expressions here.

The lock nodes introduced were temporary and must be eliminated now:

function l0_flatten_unlock(s)
   visit:lang0flat0 /rec/ (stmt: s) {
      deep expr {
         lock → elock(e)
       | else → node()}}

function l0_flatten(ts)
  map t in ts do
     visit:lang0l /rec, dst lang0flat0/ (top: t) {
        once stmt { else →
                     l0_flatten_unlock(
                       l0_flatten_stmt(node())) }}

Statement flattening

Now we can get rid of the compound statements, replacing them with a goto and labels. At the moment we only have one kind of a compound statement - an if.

ast lang0flat1 : lang0flat0 () recform {
  stmt += gotoc(expr:cnd, labident:tr, labident:fl)
       | -if3
       ;
  expr += -lock;
  hint stmt = newline;
  }

The previous passes could produce nested seq blocks. Since this is the only compound statement left by now we can just flatten everything into a single topmost sequence.

function l0_flatten_seq(s) {
   seq = visit:lang0flat1 /rec/ (stmt: s) {
      deep stmt {
         seq → map append b in bodyseq(body) do b
       | else → [node()]}};
   with target (lang0flat1) {
      mk:stmt:seqlang0flat1:stmt:seq(body)(body = seq)}}

function l0_flatten_cfg(ts)
   map t in ts do
      visit:lang0flat0 /rec, dst lang0flat1/ (top: t) {
         once stmt {
           else → l0_flatten_seq(l0_flatten_cfg_stmt(node()))}}

function l0_flatten_driver(src) {
  x1 = l0_lowering_driver(src);
  x2 = l0_lower_splitexprs(
       l0_split_statements(x1));
  x3 = l0_flatten_cfg(
       l0_flatten(x2));
  return x3
}

And our sample foldl function looks like this after flattening:

(sfunction foldl (Z555561 Z555562 Z555563)
  (seq
    (alloca Z555561)
    (alloca Z555562)
    (alloca Z555563)
    (set Z555561 (arg Z555561))
    (set Z555562 (arg Z555562))
    (set Z555563 (arg Z555563))
    (goto tailrecentry)
    (label tailrecentry)
    (alloca Z555564)
    (alloca Z555565)
    (def Z555578 (load Z555563))
    (def Z555577 (apply (global nullp) (reg Z555578)))
    (gotoc (reg Z555577) Z555579 Z555580)
    (label Z555579)
    (def Z555566 (load Z555562))
    (return (reg Z555566))
    (goto Z555581)
    (label Z555580)
    (def Z555568 (load Z555563))
    (def Z555567 (apply (global head) (reg Z555568)))
    (set Z555564 (reg Z555567))
    (def Z555570 (load Z555563))
    (def Z555569 (apply (global tail) (reg Z555570)))
    (set Z555565 (reg Z555569))
    (def Z555571 (load Z555561))
    (set Z555561 (reg Z555571))
    (def Z555573 (load Z555561))
    (def Z555574 (load Z555562))
    (def Z555575 (load Z555564))
    (def Z555572 (apply (reg Z555573) (reg Z555574) (reg Z555575)))
    (set Z555562 (reg Z555572))
    (def Z555576 (load Z555565))
    (set Z555563 (reg Z555576))
    (goto tailrecentry)
    (goto Z555581)
    (label Z555581)))

Basic blocks

Now everything is perfectly flat, so we can sort instructions into basic blocks. Also it's a right moment to do another node split - separate compound expressions and leaf values, it will help to enforce the SSA guarantee of not reassigning simple values (with the only exception of the $\varphi$ nodes, but this will be sorted out later).

Note that tailapply is gone now as it served its purpose already, replaced with an expr node - it will be useful later when we have to annotate calls.

This will be our main IR quite for a while now.

ast lang0flat2 : lang0flat1 (stmt → insn) recform {
  top =   define(ident:id, code:e)
        | defconst(ident:id, value:e)
        | sfunction(ident:id, *ident:args, code:body)
        | closure(ident:id, *ident:clargs, *ident:args, code:body)
        | eval(code:e)
        | debug(.*any:es)
        ;
  code = c(.*bblock:bs);
  bblock =
          bb(labident:lbl, *insn:body, term:t);
  value = const(const:c)
        | global(ident:id)
        | arg(ident:id)
        | clvar(ident:id)
        | reg(ident:id)
        | funref(ident:id)
        ;
  expr =  mkclosure(ident:df, *value:args)
        | apply(bool:purep, value:fn, .*value:args)
        | load(ident:id)
        // New nodes
        | phi(.*phisrc:ss)
        | select(value:cnd, value:tr, value:fl)
        // Feedback only, not allowed for redefinitions
        | value(value:v)
        ;
  phisrc = s(labident:from, value:v);
  insn =
          alloca(ident:id)
        | def(ident:id, expr:v)
        | set(ident:id, value:v)
        ;
  term =  tail(expr:e)
        | goto(labident:id)
        | gotoc(value:cnd, labident:tr, labident:fl)
        | return(value:v)
        ;
  hint phisrc = newline;
  hint insn = newline;
  hint bblock = newline;
  hint term = newline;
  hint expr:phi = itab, newline;
}

First pass over a flat code is to eliminate redundancy: e.g., if there is a sequence of labels, only one should remain (and be referred to), because a basic block can only have one name. If a terminal instruction is following another terminal instruction, it is unreachable and must be eliminated.

Once this filtering is done, rules for accumulating instructions into basic blocks are much simpler.

function l0_seq_body(s)
visit:lang0flat1 /rec/ (stmt:s) {
once stmt { seq → bodyseq(body) | else → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'()) }}

function l0_classify_insn(i)
  visit:lang0flat1 /rec/ (stmt:i) {
     once stmt {
        tailapply → 'term'()
      | goto → 'term'()
      | gotoc → 'term'()
      | return → 'term'()
      | drop → 'skip'()
      | label → 'label'(idlabel(id))
      | else → 'insn'()}}

function l0_seq_sanitise(ss)
collector(addinsn, getinsns) {
  relabelht = mkhash();
  addlabelmap(f, t) = ohashput(relabelht, f, t);
  do loop(l = ss, prevlbl = 'entry', prevterm = ∅) {
    match l with
      insn:rest → {
         class = l0_classify_insn(insn);
         match class with
            'label'(id) →
               if(prevlbl) {
                  addlabelmap(id, prevlbl);
                  loop(rest, prevlbl, ∅)
               } else {
                  addinsn(insn);
                  loop(rest, id, ∅)}
          | 'term'() →
               if (prevterm) {
                  loop(rest, ∅, true)
               } else {
                  addinsn(insn);
                  loop(rest, ∅, true)}
          | 'skip'() →
               loop(rest, ∅, ∅)
          | 'insn'() →
               if (prevterm) ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())
               else {
                  addinsn(insn);
                  loop(rest, ∅, ∅)}}};
  ss0 = getinsns();
  // Rename goto targets since we eliminated redundant labels
  return map s in ss0 do {
     fixlabel(lbl) =
            aif(chk = ohashget(relabelht, lbl)) {
               return chk
            } else lbl;
     visit:lang0flat1 /rec/ (stmt: s) {
        deep stmt {
            goto → mk:nodelang0flat1:stmt:goto(id)(id=fixlabel(idgoto(id)))
         |  gotoc → mk:nodelang0flat1:stmt:gotoc(cnd,tr,fl)
   with implicit arguments: cnd
(tr=fixlabel(trgotoc(cnd,tr,fl)), fl=fixlabel(flgotoc(cnd,tr,fl)))
         |  else  → node()}}}}

function l0_flat2(i)
   visit:lang0flat1 /rec, dst lang0flat2/ (stmt: i) {
      deep stmt {
          tailapply → mk:term:taillang0flat2:term:tail(e)(mk:expr:applylang0flat2:expr:apply(purep,fn,args)
   with implicit arguments: fn, args
(purep=∅))
        | goto → mk:term:gotolang0flat2:term:goto(id)()
        | gotoc → mk:term:gotoclang0flat2:term:gotoc(cnd,tr,fl)()
        | return → mk:term:returnlang0flat2:term:return(v)()
        | label → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())

        | alloca → mk:insn:allocalang0flat2:insn:alloca(id)()
        | def → mk:insn:deflang0flat2:insn:def(id,v)()
        | set → mk:insn:setlang0flat2:insn:set(id,v)()
        | drop → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())};
      deep expr {
          const → mk:value:constlang0flat2:value:const(c)()
        | global → mk:value:globallang0flat2:value:global(id)()
        | arg → mk:value:arglang0flat2:value:arg(id)()
        | clvar → mk:value:clvarlang0flat2:value:clvar(id)()
        | reg → mk:value:reglang0flat2:value:reg(id)()
        | funref → mk:value:funreflang0flat2:value:funref(id)()

        | mkclosure → mk:expr:mkclosurelang0flat2:expr:mkclosure(df,args)()
        | apply → mk:expr:applylang0flat2:expr:apply(purep,fn,args)
   with implicit arguments: fn, args
(purep=∅)
        | load → mk:expr:loadlang0flat2:expr:load(id)()
        }}

function l0_basic_blocks_stmts(s) {
  ss = l0_seq_sanitise(l0_seq_body(s));
  collector(bbadd, bbget) {
     do loop(l = ss, bb = 'entry', insns = ∅) {
        flush(term) = with target(lang0flat2) {
          bb = mk:bblock:bblang0flat2:bblock:bb(lbl,body,t)(lbl = bb,
                             body = map i in insns do l0_flat2(i),
                             t = l0_flat2(term));
          bbadd(bb)};
        match l with
          insn:rest → {
             class = l0_classify_insn(insn);
             match class with
                'label'(id) → { // Assuming no flocking labels left
                   if (insns) ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'());
                   loop(rest, id, ∅)}
              | 'insn'() → loop(rest, bb, insns ⊕ [insn])
              | 'skip'() → loop(rest, bb, insns)
              | 'term'() → {flush(insn); loop(rest, ∅, ∅)}}
        | ∅ → {if(insns) ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())}};
     with target (lang0flat2) {
       return mk:code:clang0flat2:code:c(bs)(bs = bbget())}}}

function l0_basic_blocks(ts)
  map t in ts do
     visit:lang0flat1 /rec, dst lang0flat2/ (top: t) {
        once stmt { else → l0_basic_blocks_stmts(node()) }}

The code after basic blocks extraction and before the generic SSA transform is following:

(sfunction foldl (Z562128 Z562129 Z562130) (c
    (bb entry (
               (alloca Z562128)
               (alloca Z562129)
               (alloca Z562130)
               (set Z562128 (arg Z562128))
               (set Z562129 (arg Z562129))
               (set Z562130 (arg Z562130)))
      (goto tailrecentry))
    (bb tailrecentry (
                      (alloca Z562131)
                      (alloca Z562132)
                      (def Z562145 (load Z562130))
                      (def Z562144 (apply  (global nullp) (reg Z562145))))
      (gotoc (reg Z562144) Z562146 Z562147))
    (bb Z562146 (
                 (def Z562133 (load Z562129)))
      (return (reg Z562133)))
    (bb Z562147 (
                 (def Z562135 (load Z562130))
                 (def Z562134 (apply  (global head) (reg Z562135)))
                 (set Z562131 (reg Z562134))
                 (def Z562137 (load Z562130))
                 (def Z562136 (apply  (global tail) (reg Z562137)))
                 (set Z562132 (reg Z562136))
                 (def Z562138 (load Z562128))
                 (set Z562128 (reg Z562138))
                 (def Z562140 (load Z562128))
                 (def Z562141 (load Z562129))
                 (def Z562142 (load Z562131))
                 (def Z562139 (apply  (reg Z562140) (reg Z562141) (reg Z562142)))
                 (set Z562129 (reg Z562139))
                 (def Z562143 (load Z562132))
                 (set Z562130 (reg Z562143)))
      (goto tailrecentry))))

Generic SSA

Now we're all set for an SSA--transform (i.e., memory to register). For this we'll use a specific simplified IR, not to be confused with a more complex Abstract SSA we'll use later.

Back from the generic SSA

After the generic SSA transform, we have all allocas, loads and stores eliminated, new phi nodes introduced and equivalent registers renamed. Now we have to apply the results back to our original AST.

And a simple wrapper function to run the generic SSA transform over top level entries:

function l0_genssa(ts) {
   map t in ts do
      visit:lang0flat2 /rec/ (top:t) {
         once code { else → l0_from_genssa(node(), l0_to_genssa(node()))
      }}}

Everything to this point

function l0_genssa_driver(src) {
  x3 = l0_flatten_driver(src);
  x4 = l0_basic_blocks(x3);
  x5 = l0_genssa(x4);
  return x5;
}

Now, back from the generic SSA, our foldl function looks like this:

(sfunction foldl (Z567409 Z567410 Z567411) (c
    (bb entry ()
      (goto tailrecentry))
    (bb tailrecentry (
                      (def Z567446
                        (phi
                              (s Z567428 (reg Z567417))
                              (s entry (arg Z567411))))
                      (def Z567445
                        (phi
                              (s Z567428 (reg Z567420))
                              (s entry (arg Z567410))))
                      (def Z567444
                        (phi
                              (s Z567428 (reg Z567444))
                              (s entry (arg Z567409))))
                      (def Z567425 (apply  (global nullp) (reg Z567446))))
      (gotoc (reg Z567425) Z567427 Z567428))
    (bb Z567427 ()
      (return (reg Z567445)))
    (bb Z567428 (
                 (def Z567415 (apply  (global head) (reg Z567446)))
                 (def Z567417 (apply  (global tail) (reg Z567446)))
                 (def Z567420 (apply  (reg Z567444) (reg Z567445) (reg Z567415))))
      (goto tailrecentry))))

And a control flow graph for this example is following:

Abstract SSA

The next few sections are entirely optional. We already have an IR suitable for code generation, but also fit for SSA-based optimisations. MBase provides an abstract SSA library which, with a help of a user--defined IR model, provide, among others, the following transforms:

Constant folding
ADCE
CFG simplification
Loop analysis
Loop invariant motion
Loop unrolling
Common subexpression elimination

For the implementation details see ssa-fold.pdf.

Some of these optimisations are enabling and may be useful for our code generation even if the target platform is capable of doing the similar optimisations on a lower level.

Keep in mind that Abstract SSA is an old, list--form AST, and by converting our nice recform AST into it we're discarding any associated metadata (most importantly, source code locations). Also, some semantics is lost and must be recovered on a way back (e.g., both arguments and closure variables are represented as registers).

We did not have any select instructions before, but they may appear after the Abstract SSA optimisations. Backend must be aware of this.

function l0_to_abstract_ssa(ts) {
  map append t in ts do
    visit:lang0flat2 /rec/ (top: t) {
     deep top {
        sfunction → ['f'(idsfunction(id,args,body), 'any'(), map a in argssfunction(id,args,body) do ['any'(); a],
                          l0_to_abstract_ssa_code(bodysfunction(id,args,body)))]
      | closure → ['f'(idclosure(id,clargs,args,body), 'any'(), [@map a in argsclosure(id,clargs,args,body) do ['any'(); a];
                                      @map a in clargsclosure(id,clargs,args,body) do ['any'(); a]],
                          l0_to_abstract_ssa_code(bodyclosure(id,clargs,args,body)))]
      | else → ∅}}}

Model

Abstract SSA implementation is IR-agnostic and therefore must be provided with some hints about the nature of this particular IR. We have to be able to tell pure intrinsics from those that might have some side effects in order to get the constant folding working. We have to provide a way to evaluate those intrinsics when all of their arguments are constant. We must tell how to create a boolean constant and how to tell if a constant is a boolean truth. All such things are forming a model which compliments abstract SSA passes.

define l0_default_model = l0_make_model()

Optimisations

function l0_genssa_opt(ts)
  map t in ts do {
     t0 = l0_to_abstract_ssa([t]);
     if (t0) {
      <ht:nv> = genssa2_process(l0_default_model, mkhash(), car(t0));
       return nv}
     }

Back from the abstract SSA

function l0_do_abstract_ssa(ts) {
  getcode(t) =
    visit:genssa2 (top: t) {
       once top { f → bodyf(nm,ret,args,body) }};
  map t in ts do {
    visit:lang0flat2 /rec/ (top: t) {
     once top {
        sfunction → {
           assa = 'f'(idsfunction(id,args,body), 'any'(), map a in argssfunction(id,args,body) do ['any'(); a],
                          l0_to_abstract_ssa_code(bodysfunction(id,args,body)));
           varht = l0_varht(thisnodesrc());
          <ht:nv> = genssa2_process(l0_default_model, mkhash(), assa);
           back = l0_from_abstract_ssa_code(varht, getcode(nv));
           mk:nodelang0flat2:top:sfunction(id,args,body)
   with implicit arguments: id, args
(body = back)}
      | closure → {
           assa = 'f'(idclosure(id,clargs,args,body), 'any'(), [@map a in argsclosure(id,clargs,args,body) do ['any'(); a];
                                      @map a in clargsclosure(id,clargs,args,body) do ['any'(); a]],
                          l0_to_abstract_ssa_code(bodyclosure(id,clargs,args,body)));
           varht = l0_varht(thisnodesrc());
          <ht:nv> = genssa2_process(l0_default_model, mkhash(), assa);
           back = l0_from_abstract_ssa_code(varht, getcode(nv));
           mk:nodelang0flat2:top:closure(id,clargs,args,body)
   with implicit arguments: id, clargs, args
(body = back)}
      | else → node()}}}}

Feedback codegen backend

Since this language allows compile time macros, we need two different backends - one for emiting the final code, and another for interpreting the code that might be used by the macros. For the latter we'll implement a primitive .net backend. Partly because it's an opportunity to showcase a few patterns in compiling an SSA IR into a stack-based target. It will consist of the following steps:

Out of SSA: we'll simply eliminate all the phi nodes and replace them with loads and stores placed accordingly.
Back to trees: collapse the register assignment sequences into expression trees where possible without affecting the effect order.
Register elimination: all the register assignments left after the previous pass must now be converted into local variables (with explicit loads and stores).
Stack machine codegen.

One might ask, why did we go through all those complex rewrite steps to get back to the expression trees again? Firstly, the expression trees are a bit different now - they do not contain any control flow inside, it's broken into basic blocks already. Secondly, we had an opportunity to do all the SSA-based stuff which is not possible on a primitive expression tree based IR. And, also, it is a bit ironic, but we're doing a redundant work here by making our IR stack-friendly - the .net engine will quickly turn it back into SSA again at a first opportunity.

Out of SSA

The only cases where a registers are set to load(something) are from the previous pass here, and it is safe to demote all such loads to values.

ast lang0flat2x : lang0flat2 () recform {
value += load(ident:id);
}

Back to trees

.NET target is a stack machine, so it makes sense to fold the sequential single use register assignments back into expression trees.

ast lang0unflat1 : lang0flat2x () recform {
  value += expr(expr:e)
         | load(ident:id)
         ;
}

function l0_is_ordered(e) {
//TODO!!!
return ∅
}

function l0_bblock_to_trees(cnt, bb)
collector(ordlistadd, ordget)
collector(useadd, useget) {
  // 1. Collect the register definitions and
  //    fill the ordered list.
  defsht = mkhash(); ordered=mkhash();
  ordadd(id) = {ordlistadd(id); ohashput(ordered,id,id)};
  fillht() =
    visit:lang0flat2x /rec/ (bblock: bb) {
      once insn {
         def → {
            ohashput(defsht, iddef(id,v), vdef(id,v));
            if (l0_is_ordered(vdef(id,v))) ordadd(iddef(id,v));}
       | else → ∅}};
  fillht();
  // 2. See which register references defies The Order.
  mkord() =
     visit:lang0flat2x /rec/ (bblock: bb) {
        once value {
           reg →
              if (ohashget(ordered, idreg(id)))
                 useadd(idreg(id))
         | else → ∅}};
  mkord();
  outoford = mkhash();
  do loop(defs = ordget(), uses = useget()) {
     if (not(defs) || not(uses)) ∅
     else {
        d = car(defs); u = car(uses);
        if (d === u) loop(cdr(defs), cdr(uses)) else
        {
           ohashput(outoford, d, d);
           loop(cdr(defs), uses) // TODO!!! wrong!!!
        }}};
  // 3. Walk in the reverse order and substitute the single-use definitions,
  //    as long as this substitution does not break an order.
  getinsns() =
     visit:lang0flat2x /rec/ (bblock: bb) {
        deep bblock {
           bb → tbb(lbl,body,t)⊕reversefunction reverse lst:
Returns the reversed list.(bodybb(lbl,body,t))};
        once term {
           tail → [mk:insn:deflang0flat2x:insn:def(id,v)('_', etail(e))]
         | gotoc → [mk:insn:deflang0flat2x:insn:def(id,v)('_', mk:valuelang0flat2x:expr:value(v)(cndgotoc(cnd,tr,fl)))]
         | return → [mk:insn:deflang0flat2x:insn:def(id,v)('_', mk:valuelang0flat2x:expr:value(v)(vreturn(v)))]
         | else → ∅}};
  removedht = mkhash();
  removed(id) = ohashget(removedht, id);
  insns = getinsns();
  process_expr(e0) =
  do loop(e = e0) {
     visit:lang0flat2x /rec, dst lang0unflat1/ (expr: e) {
        once value {
           reg →
              if (ohashget(cnt, idreg(id)) == 1) {
                // is it defined in the same bb?
                aif (chk = ohashget(defsht, idreg(id))) {
                  if (not (ohashget(outoford, idreg(id)))) {
                    ohashput(removedht, idreg(id), idreg(id));
                    mk:value:exprlang0unflat1:value:expr(e)( loop (chk) )
                  } else node()
                } else node()
              } else node()
         | else → node()}}};
  newinsns = map append i in insns do
     visit:lang0flat2x /rec, dst lang0unflat1/ (insn:i) {
       once insn {
         def → if (removed(iddef(id,v))) ∅ else [mk:nodelang0unflat1:insn:def(id,v)
   with implicit arguments: id
(v = process_expr(vdef(id,v)))]
       | set → [mk:nodelang0unflat1:insn:set(id,v)
   with implicit arguments: id
(v = mk:exprlang0unflat1:value:expr(e)(process_expr(mk:expr:valuelang0unflat1:expr:value(v)(vset(id,v)))))]
       | else → [node()]}};
  getexpr(i) = {
     ret = mkref();
     visit:lang0unflat1 /rec/ (insn:i) {
        once expr { else → ret := node() }};
     return ^ret};
  getvalue(i) = {
     ret = mkref();
     visit:lang0unflat1 /rec/ (insn:i) {
        once value { else → ret := node() }};
     return ^ret};
  reapplyinsns(lst) =
     visit:lang0flat2x /rec, dst lang0unflat1/ (bblock: bb) {
       deep bblock {
         bb → {
           <chgp:nt> = tbb(lbl,body,t);
            println('lbl'(lblbb(lbl,body,t), chgp));
            tmp = gen_pprint_ast(lang0unflat1, insn);
            iter x in lst do println(%S<<macro S<< args:
A short form for [(buildstring ...)](":: ", tmp(x)));
            mk:nodelang0unflat1:bblock:bb(lbl,body,t)
   with implicit arguments: lbl
(body = reversefunction reverse lst:
Returns the reversed list.(if(chgp) cdr(lst) else lst),
                    t = nt)}};
       deep term {
         tail → true: mk:nodelang0unflat1:term:tail(e)(e = getexpr(car(lst)))
       | gotoc → true: mk:nodelang0unflat1:term:gotoc(cnd,tr,fl)
   with implicit arguments: tr, fl
(cnd = getvalue(car(lst)))
       | return → true: mk:nodelang0unflat1:term:return(v)(v = getvalue(car(lst)))
       | else → ∅:node() }};
  return reapplyinsns(newinsns)}

function l0_bblocks_to_trees(c) {
  // 1. We can only move the single use registers. Let's count the number of
  //    uses for all the registers.
  countht = mkhash();
  count(id) = { nxt = aif(chk=ohashget(countht, id)) chk+1 else 1;
                ohashput(countht, id, nxt) };
  mkcount() =
    visit:lang0flat2x /rec/ (code: c) {
      once value { reg → count(idreg(id)) | else → ∅}};
  mkcount();
  // 2. For each basic block we maintain the order of expressions that cannot be reordered and move everything
  //    else freely.
  mktree() =
    visit:lang0flat2x /rec, dst lang0unflat1/ (code:c) {
      once bblock { bb → l0_bblock_to_trees(countht, node()) }};
  nxt = mktree();
  // 3. Use the count table again, to remove the unused register names
  visit:lang0unflat1 /rec/ (code: nxt) {
     deep insn {
        def → if (not(ohashget(countht, iddef(id,v)))) mk:nodelang0unflat1:insn:def(id,v)
   with implicit arguments: v
(id = ∅) else node()
      | else → node()}}}

No more registers

Just a nominal pass by now - just demote the remaining registers to loads/stores and issue explicit allocas for them.

Environment

We managed to keep the compiler clean and self-contained up until now, but for a real backend we have to deal with a runtime library and a global environment, so this final pass will have to keep looking things up and keeping a track of classes, fields and methods it generated. Environment contains references to some core library methods (e.g., a select function), FFI methods, the already compiled classes and methods and, in another layer, fields and methods in the class that is being compiled now.

.NET codegen

The general approach is simple - for every chunk of code (a number of definitions) a class is created. Constants are stored in static fields, all functions are turned into static methods. A special 'init' method is generated to fill all the static fields. We're not using static constructors here, because some side effect actions of this init method may not touch any of this class static fields, but affect the other global variables instead, so we have to ensure the initialisation order explicitly.

function l0_feedback_dumb(env, c)
  visit:lang0unflat1 /rec/ (code: c) {
     deep bblock {
        bb → ['label'(lblbb(lbl,body,t));@map append b in bodybb(lbl,body,t) do b;@tbb(lbl,body,t)]};
     deep code { c → map append b in bsc(bs) do b };
     deep insn {
        alloca → ['local'(idalloca(id), t_object)]
      | def → if (iddef(id,v)) ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'()) else
                 [@vdef(id,v); 'Pop'()]
      | set → [@vset(id,v); 'Stloc'('var'(idset(id,v)))]};
     deep expr {
        mkclosure → [@map append a in argsmkclosure(df,args) do a; @l0_compile_mkclosure(env, dfmkclosure(df,args), length(argsmkclosure(df,args)))]
      | apply → [@fnapply(purep,fn,args); @map append a in argsapply(purep,fn,args) do a; @l0_compile_run_method(env, length(argsapply(purep,fn,args)))]
      | load → ['Ldloc'('var'(idload(id)))]
      | phi → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())
      | value → vvalue(v)
      | select → [@trselect(cnd,tr,fl); @flselect(cnd,tr,fl); @cndselect(cnd,tr,fl); @l0_compile_select_function(env)]};
     deep value {
        expr → eexpr(e)
      | const → l0_compile_constant(env, cconst(c))
      | global → l0_compile_global(env, idglobal(id))
      | arg → ['Ldarg'(l0_map_argument(env, idarg(id)))]
      | clvar → ['Ldarg_0'(); 'Ldfld'('field'(l0_map_clenv(env, idclvar(id))))]
      | reg → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())
      | load → ['Ldloc'('var'(idload(id)))]
      | funref → l0_compile_global(env, idfunref(id))};
     deep term {
        goto → ['Br'('label'(idgoto(id)))]
      | gotoc → [@cndgotoc(cnd,tr,fl); @l0_compile_cndcheck(env); 'Brc'('label'(trgotoc(cnd,tr,fl))); 'Br'('label'(flgotoc(cnd,tr,fl)))]
      | return → [@vreturn(v); 'Ret'()]
      | tail → [@etail(e); 'Ret'()]}}

Driver

function l0_net_prep(ts)
   map t in ts do {
      visit:lang0flat2 /rec, dst lang0unflat1/ (top: t) {
         once code {
           else → l0_registers_are_not_welcome(
                    l0_bblocks_to_trees(
                     l0_reduce_loads(
                      l0_naive_phi_removal(node())))) }}}

function l0_net_backend(env, ts) {
   x1 = l0_net_prep(ts);
   tmp = gen_pprint_ast(lang0unflat1, top);
   iter x in x1 do println(tmp(x));
   println("--------------------");
   x2 = map x in x1 do
          visit:lang0unflat1 /rec/ (top: x) {
             once code {
               c → { tmp1 = l0_feedback_dumb(env, node());
                      iter t in tmp1 do println(t);
                      println("========="); }}};
   return ts;
}

Top level driver

Bringing all the passes together here:

function l0_compiler_driver(src) {
  x1 = l0_genssa_driver(src);
  x2 = l0_do_abstract_ssa(x1);
  return x2
}