Constructing a compiler

There is a lot of tutorials out there on how to write a simple interpreter or a compiler. There is most certainly no need in just another one of this kind. This tutorial is the opposite, it tells how to write a complex compiler with all the bells and whistles, while still leaving the unimportant details out and keeping things easy to understand.

As a source language we're using a very simple, somewhat functional language. It's eager, expression--oriented, no explicit destructive assignment to variables, functions may have I/O and memory side effects (using some impure intrinsics). No explicit looping constructs besides recursion, tail recursion optimisation is guaranteed by the compiler.

We build a compiler on top of this simple language, and then extend the core language with more features added as syntax extensions and compile-time macros, demonstrating a number of different important approaches used together.

There is no intention to write just another one short and dense compiler. We're fully embracing the nanopass-style of compiler construction, therefore code is deliberately verbose, but on the other hand, every step made in the compilation pipeline is easy to understand and easy to reason about.

This compiler takes its eager functional source language and reduces it to an imperative low level intermediate representation, so both functional and imperative compilation techniques are covered here, with more emphasis on the imperative side (e.g., we do not go the CPS way, using an SSA representation instead).

Naming languages is hard and does not make much sense, so we'll simply call our source language lang0, with all the consequent IR names derived from this notation.

Topics covered in this tutorial include (but not limited to):

This is also a showcase for a range of new MBase features - recform ASTs support, abstract SSA library, AST pretty printing, etc.

Initial AST

Language starts with a definition of its Abstract Syntax Tree. This is the very first AST in our compilation chain, it is generated by the parser frontend directly and contains some syntax sugar that ought to be eliminated straight away.

Since this is a "functional" (sort of) language, everything is an expression. We will split statements and expressions further down the compilation pipeline.

ast lang0src recform {
  // Toplevel statements - definitions and raw expressions
  top = define(ident:id, expr:e)
      | defmacro(ident:id, expr:e)
      | expr(expr:e)
      | debug(.*any:es)
      ;
  // All possible expressions
  expr = lambda(*ident:args, expr:body)
       | seq(.*expr:body)

       | const(const:c)
       | var(ident:id)
       | apply(expr:fn, .*expr:args)
       | apply2(vident:fn, expr:L, expr:R)
       
       | if3(expr:cnd, expr:tr, expr:fl)
       | if2(expr:cnd, expr:tr)
       | let(*letpair:ps, expr:body)

       | mklist(*expr:es)

       // This one does not come from a parser, but is
       // produced by a lexical scoping pass later on.
       | elambda(*ident:args, lenv:benv, lenv:fenv, expr:body)

       // And the following is for macros
       | toplift(top:t)
       | quasiquote(expr:e)
       | unquote(expr:e)   // only valid inside a quasiquote expression
       ;
  // All possible constant literals
  const = number(any:n)
        | string(any:s)
        | symbol(any:s)
        | nil()
        ;
  // A let binding node
  letpair = p(ident:id, expr:v);

  ident = unquote(expr:e)
        | v(vident:id)
        ;
}

Very often we will have to define a new AST which is only slighly different from another one. It does not make sense to copy-paste the same thing over and over again. Instead, new ASTs should be rewritten from the existing ones. The language after macro expansion will be the following:

ast lang0 : lang0src ( ident → oident ) recform {
  top += -defmacro;
  expr += -toplift
       |  -quasiquote
       |  -unquote
       ;
}

Parser

As we have defined the structure of this language, we can start building a parser for it.

Parsers in PFront are based on an extended PEG. Normally, a new language parser will inherit one of the existing parsers, to avoid defining common stuff like whitespace, comments, identifiers, number and string literals and so on. In this case we're inheriting PFront language entirely, though only use few primitive nodes from it.

Let's not dwell on it, ok? Far too many compiler tutorials are stuck on parsing, and parsing is the least important part of a language implementation.

parser plang0 (pfront) {
  // A target AST produced by this parser
  target lang0src;
  // Which node should be used as a whitespace - inherited from PFront
  !!Spaces;
  
  // Rules - for syntax highlighting and for separating keywords
  // from identifiers
  [lexical:] ⇐ [lexical] ⇒ {ctoken = lexic};
  [keyword:] ⇐ [keyword] ![IdentRest] ⇒ {ctoken = keyword};

  // Top entry node - returns a list of lang0 toplevel statements.
  // Note the trailing whitespaces - it is important to add them to a
  // top node, since the whitespaces are only implicitly ignored at
  // beginnings of tokens.
  plang0 ⇐ eslist<[l0topexpr]>:ts [Spaces]* ⇒ ts;

  // A single toplevel statement, including standalone expressions
  l0topexpr >> top
     ⇐ { define [l0qident]:name "=" [l0expr]:e ";"? ⇒ define(name, e) }
     /  { function [l0qident]:name "(" ecslist<[l0qident],",">:args ")" [l0expr]:b ⇒
                            define(name, lambda(args, b)) }
     /  { macro [l0qident]:name "(" ecslist<[l0qident],",">:args ")" [l0expr]:b ⇒
                            defmacro(name, lambda(args, b)) }
     /  { [l0expr]:e ⇒ expr(e) }
     ;
  // Binary operators classified by their priority
  p0binop10 ⇐ {"*" ⇒ `binopmul } / {"/" ⇒ `binopdiv };
  p0binop9x  ⇐ {"&&" ⇒ `binopand } / {"&"  ⇒ `binopbinand }
             /  {"||"  ⇒ `binopor } / {"|"  ⇒ `binopbinor };
  p0binop9  ⇐ {"<<" ⇒ `binopshl } / {">>" ⇒ `binopshr };
  p0binop8  ⇐ {"+" ⇒ `binopadd } / {"-" ⇒ `binopsub };
  p0binop7  ⇐ {"<=" ⇒ `binople } / {">=" ⇒ `binopge }
            /  {"==" ⇒ `binopeq } / {"!=" ⇒ `binopneq }
            /  {"<" ⇒ `binoplt } / {">" ⇒ `binopgt };
  // Expressions, starting with a binary. This is compiled as a Pratt parser embedded
  // into a PEG.
  binary l0expr >> expr
                ⇐ (1000) [l0expr] [p0binop10] [l0expr] ⇒ apply2(op, L, R)
                |   (950) [l0expr] [p0binop9x] [l0expr] ⇒ apply2(op, L, R)
                |   (900) [l0expr] [p0binop9]  [l0expr] ⇒ apply2(op, L, R)
                |   (800) [l0expr] [p0binop8]  [l0expr] ⇒ apply2(op, L, R)
                |   (700) [l0expr] [p0binop7]  [l0expr] ⇒ apply2(op, L, R)
                |   (600) [l0expr] "::"  [l0expr] ⇒ apply2(`cons, L, R)
                |   (500) [l0expr] "@"  [l0expr] ⇒ apply2(`append, L, R)
                |   [l0expr1]
                ;
  // Simple expressions
  l0expr1 ⇐
             { "(" [l0expr]:e ")" "(" ecslist<[l0expr],",">:args ")"
                    ⇒ apply(e,@args) }
          /  { "(" [l0expr]:e ")" ⇒ e }
          /  { "{" cslist<[l0expr], ";">:es ";"? "}" ⇒ seq(@es) }
          /  { "`" [l0expr]:e "`" ⇒ quasiquote(e) }
          /  { "::expr" "\" [l0expr]:e "\" ⇒ unquote(e) }
          /  { "::lift" [l0topexpr]:t ⇒ toplift(t) }
          /  { if "(" [l0expr]:cnd ")" [l0expr]:tr else [l0expr]:fl
                    ⇒ if3(cnd, tr, fl) }
          /  { if "(" [l0expr]:cnd ")" [l0expr]:tr
                    ⇒ if2(cnd, tr, fl) }
          /  { fun "(" ecslist<[l0qident],",">:args ")" [l0expr]:b
                    ⇒ lambda(args, b) }
          /  { let cslist<[l0letpair],",">:ps in [l0expr]:b ⇒ let(ps, b) }
          /  { return [l0expr]:e ⇒ e }
          /  { [number]:n ⇒ const(number(n)) }
          /  { [string]:s ⇒ const(string(s)) }
          /  { "[" cslist<[l0expr],";">:es "]" ⇒ mklist(es) }
          /  { "[" "]" ⇒ const(nil()) }
          /  { "'" [l0qident]:s ⇒ const(symbol(s)) }
          /  { [l0qident]:id "(" ecslist<[l0expr],",">:args ")"
                    ⇒ apply(var(id),@args) }
          /  { [l0qident]:id ⇒ var(id) }
          ;
  // A single let binding node
  l0letpair >> letpair ⇐ [l0qident]:id "=" [l0expr]:e ⇒ p(id, e);

  // Potentially unquoted identifiers
  l0qident ⇐ { "\" [l0expr]:e "\" ⇒ unquote(e) }
           /  { [ident]:id ⇒ v(id) }
           ;
}

An example of a language accepted by this parser would look like this:

/* Comment */
function foldl(fn, i, l)
{
   if(nullp(l)) return i
   else let hd = head(l),
            tl = tail(l)
         in foldl(fn, fn(i, hd), tl)
}


// One-line comment
print(foldl(fun(a,b) {a+b}, 0, [1;2;3;4;5;6]))

We will use this foldl function to illustrate the compilation pipeline.

There is nothing interesting to see there yet. Leaving it here just for a reference - this function straight after parsing:

(define (v foldl) 
  (lambda ((v fn) (v i) (v l)) 
    (seq (if3 (apply (var (v nullp)) (var (v l))) (var (v i)) 
        (let ((p (v hd) (apply (var (v head)) (var (v l)))) 
              (p (v tl) (apply (var (v tail)) (var (v l))))) 
          (apply (var (v foldl)) (var (v fn)) 
            (apply (var (v fn)) (var (v i)) (var (v hd))) (var (v tl))))))))

Macro expansion

Logically, this is the next step straight after parsing. Though, it is a relatively advanced subject that probably should be skipped at first, straight to the syntax sugar lowering. Feel free to come back and review this section after reading about the runtime-compiler feedback loop. At this moment we don't have any feedback loop, it can only be defined later, when the rest of the compiler pipeline is finalised. We'll leave an empty stub here meanwhile.

define l0_env_feedback_hook =
  mkref(λ(globenv, t) )
  
function l0_env_feedback_loop(globenv, t)
  (^l0_env_feedback_hook)(globenv, t)

Unfortunately, macros imply a very eary exposure to the concept of environment, which, again, we'd prefer to leave for much later down the pipeline. We'll keep a dummy stub here instead.

define l0_env_find_macro_hook =
  mkref(λ(globenv, id) )

function l0_env_find_macro(globenv, fn) {
  getid(fn) =
    visit:lang0src /rec/ (expr: fn) {
       deep expr {
          var → idvar(id)
        | else → };
       deep ident {
          v → idv(id) | else → }};
  (^l0_env_find_macro_hook)(globenv, getid(fn))}

In order to be able to implement macros we need an access to AST constructing functions. The easiest way of abstracting them out nicely is to use quasiquotation: any AST inside the quasiquotation is translated into an AST-constructing code that calls the appropriate runtime library functions.

function l0_compile_quasiquote(loop, e) {
   mkapp(id, args) = with target (lang0src) {
     mk:expr:applylang0src:expr:apply(fn,args)(mk:varlang0src:expr:var(id)(mk:vlang0src:ident:v(id)(id)), args)};
   mkqsymbol(id) = with target (lang0src) {
     mk:expr:constlang0src:expr:const(c)(mk:symbollang0src:const:symbol(s)(id))};
   visit:lang0src /rec/ (expr: e) {
     once expr {
        unquote → eunquote(e)
      | deep → {
           lambda → mkapp('macrolib.make_lambda',
                           [mkapp('macrolib.make_list', mk:mklistlang0src:expr:mklist(es)(argslambda(args,body)));
                            bodylambda(args,body)])
        |  seq → mkapp('macrolib.make_seq', [mk:mklistlang0src:expr:mklist(es)(bodyseq(body))])
        |  const → mkapp('macrolib.make_const', [node()])
        |  var → mkapp('macrolib.make_var', [idvar(id)])
        |  apply → mkapp('macrolib.make_apply',
                          [fnapply(fn,args); mk:mklistlang0src:expr:mklist(es)(argsapply(fn,args))])
        |  apply2 → mkapp('macrolib.make_apply2', [fnapply2(fn,L,R); Lapply2(fn,L,R); Rapply2(fn,L,R)])
        |  if3 → mkapp('macrolib.make_if3', [cndif3(cnd,tr,fl); trif3(cnd,tr,fl); flif3(cnd,tr,fl)])
        |  if2 → mkapp('macrolib.make_if2', [cndif2(cnd,tr); trif2(cnd,tr)])
        |  let → mkapp('macrolib.make_let', [mk:mklistlang0src:expr:mklist(es)(pslet(ps,body)); bodylet(ps,body)])
        |  mklist → mkapp('macrolib.make_mklist', [mk:mklistlang0src:expr:mklist(es)(esmklist(es))])
        |  toplift → mkapp('macrolib.make_toplift', [ttoplift(t)])
        |  else → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())}};
     deep ident { v → mkqsymbol(idv(id)) | unquote → eunquote(e) };
     deep letpair {
        p → mkapp('macrolib.make_letpair', [idp(id,v); vp(id,v)])};
     deep top {
        define → mkapp('macrolib.make_define', [iddefine(id,e); edefine(id,e)])
      | defmacro → mkapp('macrolib.make_defmacro', [iddefmacro(id,e); edefmacro(id,e)])
      | expr → mkapp('macrolib.make_expr', [eexpr(e)])
      | debug → mkapp('macrolib.make_debug', )}}}

Expression expansion pass includes handling the quasiquote nodes (consider quasiquote a special kind of a "macro") and user-defined macros applied via the compiler feedback loop.

function l0_macro_expand_expr(globenv, addtop, e)
  visit:lang0src /rec/ (expr:e) {
     once expr {
         quasiquote → l0_compile_quasiquote(λ(e0) l0_macro_expand_expr(globenv, addtop, e0), equasiquote(e))
       | unquote → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('MISPLACED-UNQUOTE'())
       | apply   → {
            ismacro = l0_env_find_macro(globenv, fnapply(fn,args));
            if (ismacro) {
              nxt = ismacro(globenv, node());
              l0_macro_expand_expr(globenv, addtop, nxt)
            } else 
              mk:nodelang0src:expr:apply(fn,args)(fn = l0_macro_expand_expr(globenv, addtop, fnapply(fn,args)),
                      args = map a in argsapply(fn,args) do l0_macro_expand_expr(globenv, addtop, a))
         }
       | deep → { toplift → { addtop(node()); mk:seqlang0src:expr:seq(body)(body=) }
                 | else → node() }}}

function l0_macro_expand_top(globenv, addtop, t)
  visit:lang0src /rec/ (top:t) {
     once expr { else →
       l0_macro_expand_expr(globenv, addtop, node()) }}

function l0_macro_expand(globenv, ts)
collector (addtop, gettop) {
   iter t in ts do
     visit:lang0src /rec/ (top: t) {
        once top {
           defmacro → l0_env_feedback_loop(globenv, node())
         | else → addtop(l0_macro_expand_top(globenv, addtop, node()))}};
   return gettop()}

Cleanup

Now all the quasiquotations, macro applications and top lift nodes are eliminated and we can turn our AST into a form suitable for the rest of the compiler pipeline.

function l0_postexpand_sanitise(ts)
  map t in ts do
     visit:lang0src /rec, dst lang0/ (top: t) {
        deep expr {
           toplift → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
         | quasiquote → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
         | unquote → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
         | else → node()};
        deep ident {
           unquote → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
         | v → idv(id)}}

For the compiler pipeline demo purposes we need a dummy environment:

function l0_expand_dummy(ts)
  l0_postexpand_sanitise(
    l0_macro_expand(mkhash(), ts))

Lowering

Our first AST is designed for parsing, but it is not very useful for any further analysis, optimisations and a code generation. Before we start actually compiling anything we have to eliminate some syntax sugar and make this AST simpler.

Eliminating syntax sugar

The two argument application node was only introduced to simplify binary expressions parsing and must go.

This is our first introduction to visitors - a fundamental building block in PFront. Visitor walks over an AST in a defined order and builds a new AST, transforming certain nodes with given expressions. In this case we only rewrite apply2 variant of an expr node, with all the boilerplate covering all possible paths to expr nodes being inferred automatically.

Variant constructors like mk:apply are context-dependent, in this visitor we know that the resulting AST is the same as the original AST, so it constructs lang0:expr:apply variants.

This function takes a list of toplevel statements as an argument, and returns a list of transformed toplevel statements.

function l0_eliminate_apply2(ts)
 // 'ts' is a list of top statements,
 //   we'll visit them one by one
 map t in ts do
  // A visitor block:
  //  *  The source AST is 'lang0', destination is the same
  //  *  It's a recform AST (not an old list-based one)
  //  *  The entry point is a 'top' node
  visit:lang0 /rec/ (top: t) {
     // All the 'expr' nodes are visited in a depth-first order
     deep expr {
        // All the 'apply2' variants of 'expr' are rewritten
        //  as 'apply' variants
        apply2 → mk:applylang0:expr:apply(fn,args)(fn = mk:varlang0:expr:var(id)(fnapply2(fn,L,R)), args = [Lapply2(fn,L,R);Rapply2(fn,L,R)])
        // All the other variants will remain the same
      | else → node()}}

A two-armed if does not make sense in an expression--based language and it must go. An implicit false branch returns a nil constant.

function l0_eliminate_if2(ts)
 map t in ts do
  visit:lang0 /rec/ (top:t) {
     deep expr {
        if2 → mk:if3lang0:expr:if3(cnd,tr,fl)
   with implicit arguments: cnd, tr
(fl = mk:constlang0:expr:const(c)(mk:nillang0:const:nil()()))
      | else → node()}}

And a list constructor is just a sequence of nested cons applications. We do not care if it's entirely constant at this stage, all the optimisations can be done at a later stage.

function l0_eliminate_list(ts)
  map t in ts do
    visit:lang0 /rec/ (top:t) {
      deep expr {
        mklist → do loop(e = esmklist(es)) {
                   match e with
                      hd : tl → mk:applylang0:expr:apply(fn,args)(mk:varlang0:expr:var(id)('cons'), [hd; loop(tl)])
                    |  → mk:constlang0:expr:const(c)(mk:nillang0:const:nil()())}
      | else → node()}}

And everything together now:

function l0_lowering(ts)
   l0_eliminate_list(
   l0_eliminate_if2(
   l0_eliminate_apply2(
   l0_expand_dummy(ts))))

Lexical scope

This is a lexically scoped language, allowing new definitions to override an outer scope. This is a default design decision for languages with pattern matching, for example. Our core language does not have any pattern matching support, but it can be added later with macros and syntax extensions.

The simplest approach to lexical scope implementation is to just rename all the variables to make them unique. In a consequent pass we'll collect variable and argument declarations and sort everything into categories - global variables, local variables and lambda arguments. Another category to appear later (after a lambda lifting pass) is a closure variable.

function l0_lexical_scope_expr(e0, addrenames)
   do loop(e = e0, env = ) {

     // A helper function, returns a new name if it is in the current
     // environment or id unchanged (meaning it's a global name)
     renamevar(venv, id) =
        do iloop(e = venv) {
           match e with
             [(iid when iid === id); nnm]:tl → nnm
           | hd:tl → iloop(tl)
           | else → id};

     // A helper function to access an identifier bound
     // in a let pair
     getpident(p) = visit:lang0 /rec/ (letpair:p) {
        once letpair { p → idp(id,v) }};

     // A helper function: loop into a right hand side
     // of a let pair and rename a bound identifier
     loopps(ne, p) = visit:lang0 /rec/ (letpair:p) {
        deep letpair {
           p → mk:nodelang0:letpair:p(id,v)(id = renamevar(ne, idp(id,v)),
                        v = loop(vp(id,v), env))}};
     // Main visitor: for lambda and let nodes it
     // stops and recurses into their sub-nodes explicitly,
     // while all the other variants are traversed in the depth-first
     // order.
     visit:lang0 /rec/ (expr:e) {
        once expr {
          lambda → {
             newenv = map a in argslambda(args,body) do [a; gensymfunction gensym : 
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.
()];
             addrenames(newenv);
             mk:elambdalang0:expr:elambda(args,benv,fenv,body)(args = map [a;b] in newenv do b,
                        benv = env,
                        fenv = ,
                        body = loop(bodylambda(args,body), newenv  env))}
        | let → {
             newenv = map p in pslet(ps,body) do [getpident(p); gensymfunction gensym : 
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.
()];
             addrenames(newenv);
             mk:nodelang0:expr:let(ps,body)(ps = map p in pslet(ps,body) do loopps(newenv, p),
                     body = loop(bodylet(ps,body), newenvenv))}
        | deep → { var → mk:nodelang0:expr:var(id)(id = renamevar(env, idvar(id)))
                  | else → node() }}}}

It would be nice to keep a track of all the renames we did, it could help with producing debug information and with giving error messages more context.

function l0_lexical_scope(ts)
 collector(add, get) {
   ret = map t in ts do
    visit:lang0 /rec/ (top:t) {
       once expr { else → l0_lexical_scope_expr(node(), add) }};
   with target (lang0) {
     mk:top:debuglang0:top:debug(es)('renames'(@map append g in get() do g)) : ret
   }}

Now it's time to stop using the initial AST and move on. We eliminated few variants, and are going to introduce more specific variable variants: global, local, argument or a closure variable. We're also introducing variants for a consequent lambda lifting.

The new AST is derived from the previous one (lang0), with few modifications made to the expr node.

ast lang0i : lang0 () recform {
  expr +=
      global(ident:id)
    | local(ident:id)
    | arg(ident:id)
    | clvar(ident:id)
    | mkclosure(ident:df, *expr:args)
    | clambda(*ident:clenv, *ident:args, expr:body)
    | -var
    | -apply2
    | -if2
    | -mklist
    | -lambda
    ;}

Now we can make our variables more specific, and since all the names are unique now we can do it easily in a depth--first order. Of course we could do it in the previous pass, but it's better to keep passes as simple as possible.

function l0_classify_vars_top(t) {
  dict = mkhash();
  // Fill a hash table with origins of all the definitions
  filldict() =
    visit:lang0 /rec/ (top:t) {
       deep expr {
          elambda → iter a in argselambda(args,benv,fenv,body) do ohashput(dict, a, 'arg')
        | else → };
       deep letpair {
          p → ohashput(dict, idp(id,v), 'local')}};
  filldict();

  // Replace all the var nodes with specific versions
  // and use this opportunity to mark lambda bound variables lists with
  // nearly correct origins. We're still missing closure variables, they will
  // appear later.
  visit:lang0 /rec, dst lang0i/ (top:t) {
    deep expr {
      elambda → mk:nodelang0i:expr:elambda(args,benv,fenv,body)
   with implicit arguments: args, fenv, body
(benv =
                    map [o;n] in benvelambda(args,benv,fenv,body) do
                      [n; ohashget(dict, n)])
    | var → aif (ct = ohashget(dict, idvar(id))) {
                if(ct === 'arg')
                   mk:arglang0i:expr:arg(id)(idvar(id))
                else if(ct === 'local')
                   mk:locallang0i:expr:local(id)(idvar(id))
                else ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('WHAT?'(ct))
             } else mk:globallang0i:expr:global(id)(idvar(id))
    | else → node()}}}

function l0_classify_vars(ts)
  map t in ts do
    l0_classify_vars_top(t)

Lambda lifting

Lambda lifting pass will turn all the nested lambda functions into toplevel definitions, capturing closure environments where necessary. We did the lexical scoping pass previously, and now all the names inside any given context are unique, which makes it easy to produce lists of externally defined variables used under each lambda context.

First we have to propagate the free variables lists (with bound variables already collected in the lexical scoping pass), from top to bottom.

function l0_lambda_lift_freevars(e) {
  // A helper function: returns a list of all
  // identifiers used in a given expression (bound or not)
  collectvars(e) = collector(addv, getvs) {
     visit:lang0i /rec/ (expr: e) {
        once expr {
           elambda → {iter v in fenvelambda(args,benv,fenv,body) do addv(v)}
         | deep → {
              local → addv(idlocal(id))
            | arg → addv(idarg(id))
            | else → }}};
     return getvs()};
  // Add free variable lists to all the lambda nodes,
  // in a depth--first order.
  visit:lang0i /rec/ (expr: e) {
     deep expr {
        elambda → mk:nodelang0i:expr:elambda(args,benv,fenv,body)
   with implicit arguments: args, benv, body
(fenv = collectvars(bodyelambda(args,benv,fenv,body)))
      | else → node()}}}

Now we can use an intersection between a bound variables list and a free variable list to form a closure environment.

function l0_lambda_lift_expr(e, clenv, adddef)
  // An entry for nested lambdas, to keep a track of variables
  // that were turned into closure variables in a lifted context.
  do loop(e = e, renv = ) {
   // A helper function, generating a list of
   // captured closure variables out of bound and free variables lists
   getclenv(benv, fenv) = with target(lang0i) {
      ht = mkhash();
      iter nm in fenv do ohashput(ht, nm, nm);
      map append [nm;cl] in benv do {
        if(ohashget(ht, nm)) {
           if (memq(nm, renv)) [[nm; mk:expr:clvarlang0i:expr:clvar(id)(nm)]]
           else if (cl === 'local') [[nm; mk:expr:locallang0i:expr:local(id)(nm)]]
           else if (cl === 'arg')   [[nm; mk:expr:arglang0i:expr:arg(id)(nm)]]
           else ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('WHAT?'(cl))
        } else }};
   // A main visitor, lifting all the lambda nodes into new
   // toplevel definitions and replacing them with closure allocations,
   // even if no closure environment is captured (we'll deal with this
   // case later). Also using this opportunity to make closure variable
   // references specific.
   visit:lang0i /rec/ (expr: e) {
     once expr {
        elambda → {
           intersect = getclenv(benvelambda(args,benv,fenv,body), fenvelambda(args,benv,fenv,body));
           clargs = map [nm;cl] in intersect do nm;
           newlambda = mk:clambdalang0i:expr:clambda(clenv,args,body)(clenv = clargs,
                                  args = argselambda(args,benv,fenv,body),
                                  body = loop(bodyelambda(args,benv,fenv,body), clargs  renv));
           newnm = gensymfunction gensym : 
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.
();
           adddef(mk:top:definelang0i:top:define(id,e)(newnm, newlambda));
           return mk:mkclosurelang0i:expr:mkclosure(df,args)(newnm, map [nm;cl] in intersect do cl)}
      | local → if(memq(idlocal(id), renv)) mk:clvarlang0i:expr:clvar(id)(idlocal(id)) else node()
      | arg   → if(memq(idarg(id), renv)) mk:clvarlang0i:expr:clvar(id)(idarg(id)) else node()
      | deep → { else → node() }}}}

Obviously we do not need to lift lambda expressions that are already on top, so we have to make an exception for this case.

function l0_lambda_lift_top(t, adddef) {
  liftexpr(e) = l0_lambda_lift_expr(l0_lambda_lift_freevars(e), , adddef);
  visit:lang0i /rec/ (top: t) {
     once expr { elambda → mk:clambdalang0i:expr:clambda(clenv,args,body)(clenv=, args = argselambda(args,benv,fenv,body), body = liftexpr(bodyelambda(args,benv,fenv,body)))
               | else → liftexpr(node()) }}}

Lambda--lifting a single top level statement may produce any number of new top level statements, so we'll just collect all the new statements together into one flat list.

function l0_lambda_lift(ts) {
  collector(adddef, getdefs) {
     iter t in ts do
        adddef(l0_lambda_lift_top(t, adddef));
     return getdefs()}}

And now, all the lexical scope and lambda lifting passes together:

function l0_lexical(ts)
   l0_lambda_lift(
   l0_classify_vars(
   l0_lexical_scope(ts)))

Preparing for a code generation

By now we moved all the lambdas into top level definitions, made closure environments explicit, eliminated lexical scope and reduced some syntax sugar.

Let's refine our AST further. We can classify top level statements as functions (i.e., lambdas with an empty closure environment), closures, constants, evaluated definitions and, finally, toplevel expressions.

We also eliminated the temporary elambda variant previously, and are going to get rid of nested let bindings and replace them with flat imperative assignments. Same AST is introducing a special tail call variant.

Another variant introduced here is splitexpr, it is required for separating expressions and statements further down the pipeline.

ast lang0j : lang0i () recform {
  top = define(ident:id, expr:e)
      | defconst(ident:id, expr:e)
      | sfunction(ident:id, *ident:args, expr:body)
      | closure(ident:id, *ident:clargs, *ident:args, expr:body)
      | expr(expr:e)
      | debug(.*any:es)
      ;
  expr += -elambda
       |  -clambda
       |  alloca(ident:id)
       |  set(ident:id, expr:v)
       |  drop(expr:v)
       |  funref(ident:id)
       |  tailapply(expr:fn, .*expr:args)
       |  return(expr:v)
       |  splitexpr(expr:l, expr:r)
       ;}

Removing toplevel lambdas

%"lang0i:*top -> lang0j:*top"
function l0_classify_functions(ts)
 map t in ts do 
   visit:lang0i /rec, dst lang0j/ (top: t) {
      deep top {
         define → edefine(id,e)(iddefine(id,e))
       | expr → eexpr(e)()
       | debug → node()};
      once expr(dst) {
         clambda → if (clenvclambda(clenv,args,body))
                         mk:top:closurelang0j:top:closure(id,clargs,args,body)(id = dst,
                                        clargs = clenvclambda(clenv,args,body),
                                        args = argsclambda(clenv,args,body), body = bodyclambda(clenv,args,body))
                    else mk:top:sfunctionlang0j:top:sfunction(id,args,body)(id = dst, args = argsclambda(clenv,args,body), body = bodyclambda(clenv,args,body))
      |  const → if (dst) mk:top:defconstlang0j:top:defconst(id,e)(id = dst, e = node())
                  else     mk:top:exprlang0j:top:expr(e)(mk:constlang0j:expr:const(c)(mk:nillang0j:const:nil()()))
      |  else  → if (dst) mk:top:definelang0j:top:define(id,e)(id = dst, e = node())
                  else     mk:top:exprlang0j:top:expr(e)(node())}}

function l0_classify_calls(ts)
  map t in ts do
     visit:lang0j /rec/ (top: t) {
        deep expr {
           mkclosure → if (argsmkclosure(df,args)) node()
                        else      mk:funreflang0j:expr:funref(id)(dfmkclosure(df,args))
        |  else → node()}}

Flattening bindings

Instead of our nice and clean functional variable binding we'll produce an imperative sequence of destructive assignments here. Some other passes will also introduce true destructive assignments, e.g., lifting the statement expressions and optimising direct tail recursion, so we're not losing any purity here. A consequent SSA transform will make our IR clean again.

%"lang0j:expr -> lang0j:expr"
function l0_flatten_lets_topexpr(e)
collector(addalloca, getallocas)
with target (lang0j) {
   ret = visit:lang0j /rec/ (expr: e) {
      deep expr {
         let → mk:seqlang0j:expr:seq(body)(body = pslet(ps,body)[bodylet(ps,body)])
       | else → node()};
      deep letpair {
         p → {
            addalloca(mk:expr:allocalang0j:expr:alloca(id)(idp(id,v)));
            mk:expr:setlang0j:expr:set(id,v)(idp(id,v), vp(id,v))}}};
   return mk:expr:seqlang0j:expr:seq(body)(body = getallocas()  [ret])}

function l0_flatten_lets(ts)
  map t in ts do
    visit:lang0j /rec/ (top:t) {
       once expr { else → l0_flatten_lets_topexpr(node()) }}

Redundant nested seq

There is no reason for doing this besides pure aesthetics. If you want to dump IR somewhere in the middle, deeply nested seq may be quite annoying.

function l0_seq_tidy_topexpr(e)
   do loop(e = e) {
      nested(e) =
        do nestedloop(e = e)
         visit:lang0j /rec/ (expr:e) {
            once expr {
               seq → map append b in bodyseq(body) do nestedloop(b)
             | else → [loop(node())]}};
      entry(e) =
         visit:lang0j /rec/ (expr:e) {
            once expr {
               seq → {match bodyseq(body) with
                          [one] → loop(one)
                        | else → mk:nodelang0j:expr:seq(body)(body = map append b in bodyseq(body) do nested(b))}
            |  deep → { else → node() }}};
      entry(e)}

function l0_seq_tidy(ts)
  map t in ts do
     visit:lang0j /rec/ (top:t) {
       once expr { else → l0_seq_tidy_topexpr(node()) }}

Marking tail calls

At this stage, IR is suitable for the tail call analysis, which could have been more difficult previously and will be more clumsy later. Results of this marking will be used further down in a direct tail recursion optimisation and are required for the statement vs. expression split, since we're also introducing explicit return statements here.

function tsplit(l0)
  do iloop(l = l0, acc = )
     match l with
       [hd] → [acc; hd]
     |  → [;]
     | hd:tl → iloop(tl, acc  [hd])

function l0_tailcalls_topexpr(e0) 
   do loop(e = e0)
     visit:lang0j /rec/ (expr: e) {
        once expr {
           seq → if(bodyseq(body)) {
                     <[a;b]> = tsplit(bodyseq(body));
                      mk:nodelang0j:expr:seq(body)(body = a  [loop(b)])}
                  else node()
        |  if3 → mk:nodelang0j:expr:if3(cnd,tr,fl)
   with implicit arguments: cnd
(tr = loop(trif3(cnd,tr,fl)), fl = loop(flif3(cnd,tr,fl)))
        |  apply → mk:tailapplylang0j:expr:tailapply(fn,args)(fn = fnapply(fn,args), args = argsapply(fn,args))
        |  else → mk:returnlang0j:expr:return(v)(node())}}

function l0_tailcalls(ts)
  map t in ts do
     visit:lang0j /rec/ (top:t) {
       once expr { else → l0_tailcalls_topexpr(node()) }}

Statement expressions

A procedure very similar to tail call marking, but for the statement expressions (like if and seq yielding value that is used inside another expression). We'll need it later on when we actually split statements and expressions.

The following function, just like a tail call marker pass before, will follow the returning branches of expressions only --- i.e., last entry in a sequence and both arms of an if, but instead of inserting a return or a tail call marker it will assign that value to a variable or wrap it into a drop marker, meaning that the result is going to be discarded anyway.

function l0_stmtexpr_inner(e0, dst) 
   do loop(e = e0)
     visit:lang0j /rec/ (expr: e) {
        once expr {
           seq → if(bodyseq(body)) {
                     <[a;b]> = tsplit(bodyseq(body));
                      mk:nodelang0j:expr:seq(body)(body = a  [loop(b)])}
                  else node()
        |  if3 → mk:nodelang0j:expr:if3(cnd,tr,fl)
   with implicit arguments: cnd
(tr = loop(trif3(cnd,tr,fl)), fl = loop(flif3(cnd,tr,fl)))
        |  alloca → node()
        |  set → node()
        |  else → if (dst) mk:setlang0j:expr:set(id,v)(dst, node())
                   else mk:droplang0j:expr:drop(v)(node())}}

The next function is identifying statement--expressions and treats them in a special way: any value returned from the non--terminal sequence elements is dropped, and any value of a complex statement--expression that is actually used is assigned to a variable, and a special splitexpr variant is created, holding both the statement that generates the value and an expression to access this value. After the split between statements and expressions is completed those inner statements will be lifted to their nearest statement scopes.

function l0_stmtexpr_topexpr_inner(e0, addalloca)
  do loop(e = e0) {
   inner(e) =
     visit:lang0j /rec/ (expr:e) {
        once expr {
           seq → symbols(dst) {
                   addalloca(mk:allocalang0j:expr:alloca(id)(dst));
                   mk:splitexprlang0j:expr:splitexpr(l,r)(
                     l0_stmtexpr_inner(loop(node()), dst),
                     mk:locallang0j:expr:local(id)(dst))}
         | if3 → symbols(dst) {
                   addalloca(mk:allocalang0j:expr:alloca(id)(dst));
                   mk:splitexprlang0j:expr:splitexpr(l,r)(
                     l0_stmtexpr_inner(loop(node()), dst),
                     mk:locallang0j:expr:local(id)(dst))}
         | deep → { else → node() }}};
   toplev(e) =
     visit:lang0j /rec/ (expr:e) {
        once expr {
           seq → if (bodyseq(body)) {
                     <[a;b]> = tsplit(bodyseq(body));
                      mk:nodelang0j:expr:seq(body)(body = [@map a do l0_stmtexpr_inner(loop(a), );
                                       loop(b)])
                  } else node()
         | if3 → mk:nodelang0j:expr:if3(cnd,tr,fl)(cnd = inner(cndif3(cnd,tr,fl)), tr = loop(trif3(cnd,tr,fl)), fl = loop(flif3(cnd,tr,fl)))
         | set → mk:nodelang0j:expr:set(id,v)
   with implicit arguments: id
(v = inner(vset(id,v)))
         | else → inner(node())}};
   toplev(e)}

The previous function could have created some new variables, so we have to add corresponding allocas to the beginning of the outer context.

function l0_stmtexpr_topexpr(e0)
  collector (addalloca, getallocas)
  with target (lang0j) {
    ret = l0_stmtexpr_topexpr_inner(e0, addalloca);
    allocas = getallocas();
    if(allocas)
      mk:expr:seqlang0j:expr:seq(body)(body = [@allocas;ret])
    else ret}

function l0_stmtexpr(ts)
  map t in ts do
     visit:lang0j /rec/ (top: t) {
        once expr { else → l0_stmtexpr_topexpr(node()) }}

Detecting tail recursion

Some tail calls are tail recursive, and this is a right moment to replace them with a goto. The language does not have gotos and labels? Not a problem at all, we'll make a new language now:

ast lang0k : lang0j () recform {
  expr += label(labident:id)
       |  goto(labident:id)
       | -let
       ;
  hint expr:alloca = newline;
  hint expr:set = newline;
  hint expr:seq = newline;
  hint expr:if3 = newline;
  hint expr:goto = newline;
  hint expr:label = newline;
  hint expr:return = newline;
}

If a function is directly tail recursive, we have to create temporary variables that will initially contain copies of the argument values and then will be used to pass arguments in a tail--recursive call. In this case all the argument references in the function body must be replaced with local variable references.

function l0_tailrec_topexpr(self, fnargs, e) {
   trecp = mkref();
   isself(e) =
      visit:lang0k /rec/ (expr:e) {
         once expr {
           global → idglobal(id) === self
         | else → }};
   rewriteargs(e) =
      visit:lang0k /rec/ (expr:e) {
         deep expr {
            arg → mk:locallang0k:expr:local(id)(idarg(id))
          | else → node()}};
   body0 = visit:lang0j /rec, dst lang0k/ (expr: e) {
      deep expr {
         tailapply → if (isself(fntailapply(fn,args))) {
           trecp := true;
           mk:seqlang0k:expr:seq(body)(body = [@map [a;v] in zipfunction zip a, b: 
Returns the list of ($a_i${} $b_i$) for all elements of [a] and [b].
(fnargs, argstailapply(fn,args)) do mk:setlang0k:expr:set(id,v)(a, v);
                          mk:gotolang0k:expr:goto(id)('tailrecentry')])
         } else node()
       | else → node()}};
   if (^trecp) with target (lang0k) {
     mk:expr:seqlang0k:expr:seq(body)(body = [@map a in fnargs do mk:allocalang0k:expr:alloca(id)(a);
                         @map a in fnargs do mk:setlang0k:expr:set(id,v)(a, mk:arglang0k:expr:arg(id)(a));
                         mk:gotolang0k:expr:goto(id)('tailrecentry');
                         mk:labellang0k:expr:label(id)('tailrecentry');
                         rewriteargs(body0)]) 
   } else body0}

function l0_tailrec(ts)
  map t in ts do
     visit:lang0j /rec, dst lang0k/ (top: t) {
        once top {
           sfunction → mk:nodelang0k:top:sfunction(id,args,body)
   with implicit arguments: id, args
(body = l0_tailrec_topexpr(idsfunction(id,args,body), argssfunction(id,args,body), bodysfunction(id,args,body)))
         | deep → { else → node() }};
        once expr {
           // We still have to do this pass even if it's not a function,
           // because we're moving to another AST version here.
           else → l0_tailrec_topexpr('*dummy*', , node())}}

All together

function l0_codegen_prep(ts)
  l0_tailrec(
  l0_stmtexpr(
  l0_tailcalls(
  l0_seq_tidy(
  l0_flatten_lets(
  l0_classify_calls(
  l0_classify_functions(ts)))))))

function l0_lowering_driver(src) {
  x1 = l0_codegen_prep(
       l0_lexical(
       l0_lowering(src)));
  return x1 
}

At this point, the foldl function looks like this:

(sfunction foldl (Z541026 Z541027 Z541028) 
  (seq 
    (alloca Z541026) 
    (alloca Z541027) 
    (alloca Z541028) 
    (set Z541026 (arg Z541026)) 
    (set Z541027 (arg Z541027)) 
    (set Z541028 (arg Z541028)) 
    (goto tailrecentry) 
    (label tailrecentry) 
    (seq 
      (alloca Z541029) 
      (alloca Z541030) 
      (if3 (apply (global nullp) (local Z541028)) 
        (return (local Z541027)) 
        (seq 
          (set Z541029 (apply (global head) (local Z541028))) 
          (set Z541030 (apply (global tail) (local Z541028))) 
          (seq 
            (set Z541026 (local Z541026)) 
            (set Z541027 (apply (local Z541026) (local Z541027) (local Z541029))) 
            (set Z541028 (local Z541030)) 
            (goto tailrecentry)))))))

Code generation: an imperative IR

By now our source functional language got transformed into an imperative one, with explicitly allocated variables, destructive assignments and properly marked tail calls. Nested lambda functions and lexical scope were eliminated. There is already a clear separation between statements and expressions that occured naturally during the previous passes. Goto, seq, drop, alloca, set, if, return and tailapply are statements, while everything else is an expression.

If our target was a stack machine, it would have been possible to keep an expression-based IR all the way down. Unfortunately, stack machines are not quite fit for optimisations and analysis, and we're not going down the CPS route either.

In an SSA-based IR we have to *assign* results to named registers. Control flow statements must be habdled differently from the expressions yielding results, and for this reason we're starting this split here.

It is an important compiler construction pattern that occurs even in the languages that had distinct statements from the beginning.

We can formalise this split now. The new AST lang0k defines everything that returns a value as an expresssion and everything control flow as a statement. An if3 statement does not return anything any longer, we ensured this by assigning its return value to a variable previously, in a preparation phase.

ast lang0l : lang0k () recform {
  top = define(ident:id, stmt:e)
      | defconst(ident:id, expr:e)
      | sfunction(ident:id, *ident:args, stmt:body)
      | closure(ident:id, *ident:clargs, *ident:args, stmt:body)
      | eval(stmt:e)
      | debug(.*any:es)
      ;
  expr =
        const(const:c)
      | global(ident:id)
      | local(ident:id)
      | arg(ident:id)
      | clvar(ident:id)
      | mkclosure(ident:df, *expr:args)
      | funref(ident:id)
      | apply(expr:fn, .*expr:args)
      | splitexpr(stmt:l, expr:r)
      ;
  stmt =
        seq(.*stmt:body)
      | if3(expr:cnd, stmt:tr, stmt:fl)
      | alloca(ident:id)
      | set(ident:id, expr:v)
      | tailapply(expr:fn, .*expr:args)
      | return(expr:v)
      | label(labident:id)
      | goto(labident:id)
      | drop(expr:v)
      ;
}

The following pass dows not change the tree structure (besides renaming the toplevel expr to eval), it merely sorts out the variants into their new nodes.

function l0_split_statements(ts)
  map t in ts do
    visit:lang0k /rec, dst lang0l/ (top: t) {
       deep top {
          expr → mk:top:evallang0l:top:eval(e)(eexpr(e))
        | else → node()};
       deep expr {
          seq → mk:stmt:seqlang0l:stmt:seq(body)()
        | if3 → mk:stmt:if3lang0l:stmt:if3(cnd,tr,fl)()
        | alloca → mk:stmt:allocalang0l:stmt:alloca(id)()
        | set → mk:stmt:setlang0l:stmt:set(id,v)()
        | tailapply → mk:stmt:tailapplylang0l:stmt:tailapply(fn,args)()
        | return → mk:stmt:returnlang0l:stmt:return(v)()
        | label → mk:stmt:labellang0l:stmt:label(id)()
        | goto → mk:stmt:gotolang0l:stmt:goto(id)()
        | drop → mk:stmt:droplang0l:stmt:drop(v)()

        | else → node()}}

After this pass we still have that splitexpr thing in our IR, and it would be nice to flatten it before we go any further.

function l0_lower_splitexpr_stmt(s) {
   collectsplit(s, add) =
     visit:lang0l /rec/ (stmt:s) {
        deep expr {
           splitexpr → {
              add(lsplitexpr(l,r));
              return rsplitexpr(l,r)}
        |  else → node()}};
   visit:lang0l /rec/ (stmt: s) {
     deep stmt {
        else → collector(add, get) {
           nbody = collectsplit(node(), add);
           lifted = get();
           if (lifted) mk:seqlang0l:stmt:seq(body)(body = [@lifted; nbody])
           else nbody}}}}

function l0_lower_splitexprs(ts)
   map t in ts do
     visit:lang0l /rec/ (top: t) {
        once stmt { else → l0_lower_splitexpr_stmt(node())}}

Flat IR

With a clean and simple low level imperative IR we can now turn to all the different set of compiler construction tools. First we'll lower this IR into an SSA form and apply all the standard SSA-based torture techniques to it.

The first step in flattening will produce almost the same IR, with all the complex expressions broken down to register assignments. We could have used local variable assignments here, but since we're going into an SSA soon anyway it does not matter if "registers" are introduced a bit earlier.

Note this lock node here - it is merely an optimisation, to ensure the visitor is not going into an already flattened code.

ast lang0flat0 : lang0l () recform {
  stmt += def(ident:id, expr:v);
  expr += reg(ident:id)
       |  load(ident:id)
       |  lock(expr:e);}

Expresison flattening

Getting ready to descend to SSA. This involves adding explicit load and store instructions for the local stack--allocated variables, so we'll lift local references alongside with compound expressions here.

function l0_flatten_stmt(s) {
  liftexpr(e, add, loop) =
    visit:lang0flat0 /rec/ (expr: e) {
       once expr {
           const → node()
         | global → node()
         | arg → node()
         | clvar → node()
         | funref → node()
         | local → symbols(newid) {
              add(mk:stmt:deflang0flat0:stmt:def(id,v)(newid, mk:locklang0flat0:expr:lock(e)(mk:loadlang0flat0:expr:load(id)(idlocal(id)))));
              return mk:reglang0flat0:expr:reg(id)(newid)}
         | else → symbols(newid) {
              add(mk:stmt:deflang0flat0:stmt:def(id,v)(newid, mk:locklang0flat0:expr:lock(e)(loop(node()))));
              return mk:reglang0flat0:expr:reg(id)(newid)}}};
  // Skip the topmost expression and lift subexpressions only
  doexpr(e, add) = do loop(e = e)
    visit:lang0flat0 /rec/ (topexpr: e) {
       topexpr as expr { else → node() };
       once expr { lock → node()
                 | else → liftexpr(node(), add, loop) }};
  // Skip the topmost statements and process statements one level down
  dostmt(s, add) =
    visit:lang0flat0 /rec/ (topstmt: s) {
       topstmt as stmt { else → node() };
       once stmt { else → node() };
       once expr { lock → node() |
                   else → liftexpr(node(), add, λ(e) doexpr(e, add)) }};
  visit:lang0l /rec, dst lang0flat0/ (stmt: s) {
    deep stmt {
       else → collector(add, get) {
           ret = dostmt(node(), add);
           lifted = get();
           if(lifted) mk:seqlang0flat0:stmt:seq(body)(body = [@lifted; ret])
           else       ret
           }}}}

The lock nodes introduced were temporary and must be eliminated now:

function l0_flatten_unlock(s)
   visit:lang0flat0 /rec/ (stmt: s) {
      deep expr {
         lock → elock(e)
       | else → node()}}

function l0_flatten(ts)
  map t in ts do
     visit:lang0l /rec, dst lang0flat0/ (top: t) {
        once stmt { else →
                     l0_flatten_unlock(
                       l0_flatten_stmt(node())) }}

Statement flattening

Now we can get rid of the compound statements, replacing them with a goto and labels. At the moment we only have one kind of a compound statement - an if.

ast lang0flat1 : lang0flat0 () recform {
  stmt += gotoc(expr:cnd, labident:tr, labident:fl)
       | -if3
       ;
  expr += -lock;
  hint stmt = newline;
  }

function l0_flatten_cfg_stmt(s) 
   visit:lang0flat0 /rec, dst lang0flat1/ (stmt: s) {
      deep stmt {
         if3 → symbols(ltr, lfl, lnext) {
            mk:seqlang0flat1:stmt:seq(body)(body = [mk:gotoclang0flat1:stmt:gotoc(cnd,tr,fl)(cndif3(cnd,tr,fl), ltr, lfl);
                           mk:labellang0flat1:stmt:label(id)(ltr);
                           trif3(cnd,tr,fl);
                           mk:gotolang0flat1:stmt:goto(id)(lnext);
                           mk:labellang0flat1:stmt:label(id)(lfl);
                           flif3(cnd,tr,fl);
                           mk:gotolang0flat1:stmt:goto(id)(lnext);
                           mk:labellang0flat1:stmt:label(id)(lnext)])}
       | else → node()}}

The previous passes could produce nested seq blocks. Since this is the only compound statement left by now we can just flatten everything into a single topmost sequence.

function l0_flatten_seq(s) {
   seq = visit:lang0flat1 /rec/ (stmt: s) {
      deep stmt {
         seq → map append b in bodyseq(body) do b
       | else → [node()]}};
   with target (lang0flat1) {
      mk:stmt:seqlang0flat1:stmt:seq(body)(body = seq)}}

function l0_flatten_cfg(ts)
   map t in ts do
      visit:lang0flat0 /rec, dst lang0flat1/ (top: t) {
         once stmt {
           else → l0_flatten_seq(l0_flatten_cfg_stmt(node()))}}

function l0_flatten_driver(src) {
  x1 = l0_lowering_driver(src);
  x2 = l0_lower_splitexprs(
       l0_split_statements(x1));
  x3 = l0_flatten_cfg(
       l0_flatten(x2));
  return x3
}

And our sample foldl function looks like this after flattening:

(sfunction foldl (Z555561 Z555562 Z555563) 
  (seq 
    (alloca Z555561) 
    (alloca Z555562) 
    (alloca Z555563) 
    (set Z555561 (arg Z555561)) 
    (set Z555562 (arg Z555562)) 
    (set Z555563 (arg Z555563)) 
    (goto tailrecentry) 
    (label tailrecentry) 
    (alloca Z555564) 
    (alloca Z555565) 
    (def Z555578 (load Z555563)) 
    (def Z555577 (apply (global nullp) (reg Z555578))) 
    (gotoc (reg Z555577) Z555579 Z555580) 
    (label Z555579) 
    (def Z555566 (load Z555562)) 
    (return (reg Z555566)) 
    (goto Z555581) 
    (label Z555580) 
    (def Z555568 (load Z555563)) 
    (def Z555567 (apply (global head) (reg Z555568))) 
    (set Z555564 (reg Z555567)) 
    (def Z555570 (load Z555563)) 
    (def Z555569 (apply (global tail) (reg Z555570))) 
    (set Z555565 (reg Z555569)) 
    (def Z555571 (load Z555561)) 
    (set Z555561 (reg Z555571)) 
    (def Z555573 (load Z555561)) 
    (def Z555574 (load Z555562)) 
    (def Z555575 (load Z555564)) 
    (def Z555572 (apply (reg Z555573) (reg Z555574) (reg Z555575))) 
    (set Z555562 (reg Z555572)) 
    (def Z555576 (load Z555565)) 
    (set Z555563 (reg Z555576)) 
    (goto tailrecentry) 
    (goto Z555581) 
    (label Z555581)))

Basic blocks

Now everything is perfectly flat, so we can sort instructions into basic blocks. Also it's a right moment to do another node split - separate compound expressions and leaf values, it will help to enforce the SSA guarantee of not reassigning simple values (with the only exception of the $\varphi$ nodes, but this will be sorted out later).

Note that tailapply is gone now as it served its purpose already, replaced with an expr node - it will be useful later when we have to annotate calls.

This will be our main IR quite for a while now.

ast lang0flat2 : lang0flat1 (stmt → insn) recform {
  top =   define(ident:id, code:e)
        | defconst(ident:id, value:e)
        | sfunction(ident:id, *ident:args, code:body)
        | closure(ident:id, *ident:clargs, *ident:args, code:body)
        | eval(code:e)
        | debug(.*any:es)
        ;
  code = c(.*bblock:bs);
  bblock =
          bb(labident:lbl, *insn:body, term:t);
  value = const(const:c)
        | global(ident:id)
        | arg(ident:id)
        | clvar(ident:id)
        | reg(ident:id)
        | funref(ident:id)
        ;
  expr =  mkclosure(ident:df, *value:args)
        | apply(bool:purep, value:fn, .*value:args)
        | load(ident:id)
        // New nodes
        | phi(.*phisrc:ss)
        | select(value:cnd, value:tr, value:fl)
        // Feedback only, not allowed for redefinitions
        | value(value:v)
        ;
  phisrc = s(labident:from, value:v);
  insn =
          alloca(ident:id)
        | def(ident:id, expr:v)
        | set(ident:id, value:v)
        ;
  term =  tail(expr:e)
        | goto(labident:id)
        | gotoc(value:cnd, labident:tr, labident:fl)
        | return(value:v)
        ;
  hint phisrc = newline;
  hint insn = newline;
  hint bblock = newline;
  hint term = newline;
  hint expr:phi = itab, newline;
}

First pass over a flat code is to eliminate redundancy: e.g., if there is a sequence of labels, only one should remain (and be referred to), because a basic block can only have one name. If a terminal instruction is following another terminal instruction, it is unreachable and must be eliminated.

Once this filtering is done, rules for accumulating instructions into basic blocks are much simpler.

function l0_seq_body(s)
  visit:lang0flat1 /rec/ (stmt:s) {
     once stmt { seq → bodyseq(body) | else → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'()) }}

function l0_classify_insn(i)
  visit:lang0flat1 /rec/ (stmt:i) {
     once stmt {
        tailapply → 'term'()
      | goto → 'term'()
      | gotoc → 'term'()
      | return → 'term'()
      | drop → 'skip'()
      | label → 'label'(idlabel(id))
      | else → 'insn'()}}

function l0_seq_sanitise(ss)
collector(addinsn, getinsns) {
  relabelht = mkhash();
  addlabelmap(f, t) = ohashput(relabelht, f, t);
  do loop(l = ss, prevlbl = 'entry', prevterm = ) {
    match l with
      insn:rest → {
         class = l0_classify_insn(insn);
         match class with
            'label'(id) → 
               if(prevlbl) {
                  addlabelmap(id, prevlbl);
                  loop(rest, prevlbl, )
               } else {
                  addinsn(insn);
                  loop(rest, id, )}
          | 'term'() → 
               if (prevterm) {
                  loop(rest, , true)
               } else {
                  addinsn(insn);
                  loop(rest, , true)}
          | 'skip'() →
               loop(rest, , )
          | 'insn'() →
               if (prevterm) ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
               else {
                  addinsn(insn);
                  loop(rest, , )}}};
  ss0 = getinsns();
  // Rename goto targets since we eliminated redundant labels
  return map s in ss0 do {
     fixlabel(lbl) =
            aif(chk = ohashget(relabelht, lbl)) {
               return chk
            } else lbl;
     visit:lang0flat1 /rec/ (stmt: s) {
        deep stmt {
            goto → mk:nodelang0flat1:stmt:goto(id)(id=fixlabel(idgoto(id)))
         |  gotoc → mk:nodelang0flat1:stmt:gotoc(cnd,tr,fl)
   with implicit arguments: cnd
(tr=fixlabel(trgotoc(cnd,tr,fl)), fl=fixlabel(flgotoc(cnd,tr,fl)))
         |  else  → node()}}}}

function l0_flat2(i)
   visit:lang0flat1 /rec, dst lang0flat2/ (stmt: i) {
      deep stmt {
          tailapply → mk:term:taillang0flat2:term:tail(e)(mk:expr:applylang0flat2:expr:apply(purep,fn,args)
   with implicit arguments: fn, args
(purep=))
        | goto → mk:term:gotolang0flat2:term:goto(id)()
        | gotoc → mk:term:gotoclang0flat2:term:gotoc(cnd,tr,fl)()
        | return → mk:term:returnlang0flat2:term:return(v)()
        | label → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())

        | alloca → mk:insn:allocalang0flat2:insn:alloca(id)()
        | def → mk:insn:deflang0flat2:insn:def(id,v)()
        | set → mk:insn:setlang0flat2:insn:set(id,v)()
        | drop → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())};
      deep expr {
          const → mk:value:constlang0flat2:value:const(c)()
        | global → mk:value:globallang0flat2:value:global(id)()
        | arg → mk:value:arglang0flat2:value:arg(id)()
        | clvar → mk:value:clvarlang0flat2:value:clvar(id)()
        | reg → mk:value:reglang0flat2:value:reg(id)()
        | funref → mk:value:funreflang0flat2:value:funref(id)()

        | mkclosure → mk:expr:mkclosurelang0flat2:expr:mkclosure(df,args)()
        | apply → mk:expr:applylang0flat2:expr:apply(purep,fn,args)
   with implicit arguments: fn, args
(purep=)
        | load → mk:expr:loadlang0flat2:expr:load(id)()
        }}

function l0_basic_blocks_stmts(s) {
  ss = l0_seq_sanitise(l0_seq_body(s));
  collector(bbadd, bbget) {
     do loop(l = ss, bb = 'entry', insns = ) {
        flush(term) = with target(lang0flat2) {
          bb = mk:bblock:bblang0flat2:bblock:bb(lbl,body,t)(lbl = bb,
                             body = map i in insns do l0_flat2(i),
                             t = l0_flat2(term));
          bbadd(bb)};
        match l with
          insn:rest → {
             class = l0_classify_insn(insn);
             match class with
                'label'(id) → { // Assuming no flocking labels left
                   if (insns) ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'());
                   loop(rest, id, )}
              | 'insn'() → loop(rest, bb, insns  [insn])
              | 'skip'() → loop(rest, bb, insns)
              | 'term'() → {flush(insn); loop(rest, , )}}
        |  → {if(insns) ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())}};
     with target (lang0flat2) {
       return mk:code:clang0flat2:code:c(bs)(bs = bbget())}}}

function l0_basic_blocks(ts) 
  map t in ts do
     visit:lang0flat1 /rec, dst lang0flat2/ (top: t) {
        once stmt { else → l0_basic_blocks_stmts(node()) }}

The code after basic blocks extraction and before the generic SSA transform is following:

(sfunction foldl (Z562128 Z562129 Z562130) (c 
    (bb entry (
               (alloca Z562128) 
               (alloca Z562129) 
               (alloca Z562130) 
               (set Z562128 (arg Z562128)) 
               (set Z562129 (arg Z562129)) 
               (set Z562130 (arg Z562130))) 
      (goto tailrecentry)) 
    (bb tailrecentry (
                      (alloca Z562131) 
                      (alloca Z562132) 
                      (def Z562145 (load Z562130)) 
                      (def Z562144 (apply  (global nullp) (reg Z562145)))) 
      (gotoc (reg Z562144) Z562146 Z562147)) 
    (bb Z562146 (
                 (def Z562133 (load Z562129))) 
      (return (reg Z562133))) 
    (bb Z562147 (
                 (def Z562135 (load Z562130)) 
                 (def Z562134 (apply  (global head) (reg Z562135))) 
                 (set Z562131 (reg Z562134)) 
                 (def Z562137 (load Z562130)) 
                 (def Z562136 (apply  (global tail) (reg Z562137))) 
                 (set Z562132 (reg Z562136)) 
                 (def Z562138 (load Z562128)) 
                 (set Z562128 (reg Z562138)) 
                 (def Z562140 (load Z562128)) 
                 (def Z562141 (load Z562129)) 
                 (def Z562142 (load Z562131)) 
                 (def Z562139 (apply  (reg Z562140) (reg Z562141) (reg Z562142))) 
                 (set Z562129 (reg Z562139)) 
                 (def Z562143 (load Z562132)) 
                 (set Z562130 (reg Z562143))) 
      (goto tailrecentry))))

Generic SSA

Now we're all set for an SSA--transform (i.e., memory to register). For this we'll use a specific simplified IR, not to be confused with a more complex Abstract SSA we'll use later.

function l0_to_genssa(c)
collector(liftadd, liftget)
collector(addalloca, getallocas) {
   vals = mkhash();
   addval(nm, v) = ohashput(vals, nm, v);
   newlift(v) =
      symbols(id) { addval(id, v); liftadd([id;'use'()]); id };
   getexitsterm(t) =
       collector(add, get) {
          visit:lang0flat2 /rec/ (term:t) {
            once labident : add(thisnodesrc())};
          return get()};
   getbbexits(bb) = collector(add, get) {
    visit:lang0flat2 /rec/ (bblock:bb) {
      once term { else → add(getexitsterm(node()))}};
    return car(get())};
   getcode(t) =
      visit:lang0flat2 /rec/ (term:t) {
         once term {
            tail → symbols(retn) [[retn;etail(e)]]
          | else → }};
   gs = visit:lang0flat2 /rec/ (code:c) {
             deep code { c → bsc(bs) };
             deep bblock {
                bb → 'b'(lblbb(lbl,body,t), liftget()(map append b in bodybb(lbl,body,t) do b)getcode(tbb(lbl,body,t)), getbbexits(thisnodesrc()))};
             deep insn {
                alloca → {addalloca(idalloca(id)); }
              | def    → [[iddef(id,v); vdef(id,v)]]
              | set    → [[gensymfunction gensym : 
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.
(); 'store'(idset(id,v), vset(id,v))]]
              | drop   → };
             deep expr {
                load   → 'load'(idload(id))
              | apply  → 'use'(fnapply(purep,fn,args),@argsapply(purep,fn,args))
              | mkclosure → 'use'(@argsmkclosure(df,args))};
             deep value {
                reg    → idreg(id)
              | else   → newlift(node())}};
   allocas = getallocas();
   // Use the library function to run mem2reg pass on a generic SSA:
   nssa = ssa_transform(gs, allocas);
   return vals:nssa}

Back from the generic SSA

After the generic SSA transform, we have all allocas, loads and stores eliminated, new phi nodes introduced and equivalent registers renamed. Now we have to apply the results back to our original AST.

function l0_from_genssa(c, p) {
  <valsht:[vmap;ngen;DT]> = p;

   // Generic SSA transform collected register renames, we
   // have to follow them here. Also, lifted constant values were
   // given temporary register names in the previous transform,
   // we have to recover them back.
    getrenamed0(n) = {
       do loop(x = ohashget(vmap, n), p = n) {
          if(x) return loop(ohashget(vmap, x), x)
          else return p }};
   getrenamed(n) = with target (lang0flat2) {
       nreg = getrenamed0(n);
       aif(chk = ohashget(valsht, nreg)) {
         chk
       } else mk:value:reglang0flat2:value:reg(id)(nreg)};

   isrenamed(n) = ohashget(vmap, n);

   // Now we can collect the phi nodes from the generic SSA representation,
   // and then add them to their corresponding source basic blocks.
   phis = mkhash();
   fillphis() =
      visit:genssa (code: ngen) {
          deep bblock { b → iter ops do ops(nameb(name,ops,nexts)) };
          deep oppair : λ(bb) { op(bb, name) };
          deep iop {
             phi → λ(bb, tgt) with target (lang0flat2) {
               nphi =
                  mk:insn:deflang0flat2:insn:def(id,v)(tgt,
                     mk:philang0flat2:expr:phi(ss)(ss =
                        map [f;v] in zipfunction zip a, b: 
Returns the list of ($a_i${} $b_i$) for all elements of [a] and [b].
(prevsphi(orig,prevs,vals), valsphi(orig,prevs,vals)) do
                           mk:slang0flat2:phisrc:s(from,v)(f, getrenamed(v))));
               ohashput(phis, bb,
                   ohashget(phis, bb)[nphi])}
          |  else → λ(bb, tgt)  }};
   fillphis();
   getphis(lbl) = ohashget(phis, lbl);

   // Apply all the changes to the original lang0flat2 code
   visit:lang0flat2 /rec/ (code: c) {
       deep bblock {
          bb → {
             phis = getphis(lblbb(lbl,body,t));
             mk:nodelang0flat2:bblock:bb(lbl,body,t)
   with implicit arguments: lbl, t
(body = phis  map append b in bodybb(lbl,body,t) do b)
          }};
       deep insn {
             alloca → 
          |  set → 
          |  def → if (isrenamed(iddef(id,v)))  else [node()]
          |  else → [node()]};
       deep value {
             reg → getrenamed(idreg(id))
          |  else → node()}}}

And a simple wrapper function to run the generic SSA transform over top level entries:

function l0_genssa(ts) {
   map t in ts do
      visit:lang0flat2 /rec/ (top:t) {
         once code { else → l0_from_genssa(node(), l0_to_genssa(node()))
      }}}

Everything to this point

function l0_genssa_driver(src) {
  x3 = l0_flatten_driver(src);
  x4 = l0_basic_blocks(x3);
  x5 = l0_genssa(x4);
  return x5;
}

Now, back from the generic SSA, our foldl function looks like this:

(sfunction foldl (Z567409 Z567410 Z567411) (c 
    (bb entry () 
      (goto tailrecentry)) 
    (bb tailrecentry (
                      (def Z567446 
                        (phi 
                              (s Z567428 (reg Z567417)) 
                              (s entry (arg Z567411)))) 
                      (def Z567445 
                        (phi 
                              (s Z567428 (reg Z567420)) 
                              (s entry (arg Z567410)))) 
                      (def Z567444 
                        (phi 
                              (s Z567428 (reg Z567444)) 
                              (s entry (arg Z567409)))) 
                      (def Z567425 (apply  (global nullp) (reg Z567446)))) 
      (gotoc (reg Z567425) Z567427 Z567428)) 
    (bb Z567427 () 
      (return (reg Z567445))) 
    (bb Z567428 (
                 (def Z567415 (apply  (global head) (reg Z567446))) 
                 (def Z567417 (apply  (global tail) (reg Z567446))) 
                 (def Z567420 (apply  (reg Z567444) (reg Z567445) (reg Z567415)))) 
      (goto tailrecentry))))

And a control flow graph for this example is following:

Abstract SSA

The next few sections are entirely optional. We already have an IR suitable for code generation, but also fit for SSA-based optimisations. MBase provides an abstract SSA library which, with a help of a user--defined IR model, provide, among others, the following transforms:

For the implementation details see ssa-fold.pdf.

Some of these optimisations are enabling and may be useful for our code generation even if the target platform is capable of doing the similar optimisations on a lower level.

Keep in mind that Abstract SSA is an old, list--form AST, and by converting our nice recform AST into it we're discarding any associated metadata (most importantly, source code locations). Also, some semantics is lost and must be recovered on a way back (e.g., both arguments and closure variables are represented as registers).

We did not have any select instructions before, but they may appear after the Abstract SSA optimisations. Backend must be aware of this.

function l0_to_abstract_ssa_code(c) {
  isintrinsicfn(id) = {
    case(id) {
       'binopadd' | 'binopsub' | 'binopmul'
     | 'binopdiv'
     | 'binopeq' | 'binopneq' → id
     | else → 
    }};
  isintrinsic(f) =
    {match f with
       'glob'(id) → isintrinsicfn(id)
     | else → };
  visit:lang0flat2 /rec/ (code: c) {
    deep code { c → bsc(bs) };
    deep bblock {
       bb → {
         <te:t1> = tbb(lbl,body,t);
          'b'(lblbb(lbl,body,t), bodybb(lbl,body,t)te, t1)}};
    deep insn {
       def → [iddef(id,v); vdef(id,v)]
     | else → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())};
    deep term {
       goto → :'br'(idgoto(id))
     | gotoc → :'brc'(cndgotoc(cnd,tr,fl), trgotoc(cnd,tr,fl), flgotoc(cnd,tr,fl))
     | return → [[gensymfunction gensym : 
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.
();'call'(['other'('src'(thisnodesrc()))], '*return*', vreturn(v))]]:'none'()
     | tail → symbols(ret)
                   [[ret;etail(e)];
                    [gensymfunction gensym : 
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.
();'call'(['other'('src'(thisnodesrc()))], '*tailreturn*', 'var'(ret))]]:
                       'none'()};
    deep expr {
       mkclosure → 'call'(['other'('src'(thisnodesrc()))], '*mkclosure*', 'glob'(dfmkclosure(df,args)),@argsmkclosure(df,args))
     | apply → aif (intr = isintrinsic(fnapply(purep,fn,args))) {
                  'call'(['other'('src'(thisnodesrc()))],
                         intr, @argsapply(purep,fn,args))
                } else 'call'(['other'('src'(thisnodesrc()))], '*funcall*', fnapply(purep,fn,args), @argsapply(purep,fn,args))
     | load → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
     | phi → 'phi'(@ssphi(ss))
     | select → 'select'(cndselect(cnd,tr,fl), trselect(cnd,tr,fl), flselect(cnd,tr,fl))};
    deep phisrc {
       s → 'a'(froms(from,v), vs(from,v))};
    deep value {
       const → 'const'('any'(), cconst(c))
     | global → 'glob'(idglobal(id))
     | arg → 'var'(idarg(id))
     | clvar → 'var'(idclvar(id))
     | reg → 'var'(idreg(id))
     | funref → 'glob'(idfunref(id))}}}

function l0_to_abstract_ssa(ts) {
  map append t in ts do 
    visit:lang0flat2 /rec/ (top: t) {
     deep top {
        sfunction → ['f'(idsfunction(id,args,body), 'any'(), map a in argssfunction(id,args,body) do ['any'(); a],
                          l0_to_abstract_ssa_code(bodysfunction(id,args,body)))]
      | closure → ['f'(idclosure(id,clargs,args,body), 'any'(), [@map a in argsclosure(id,clargs,args,body) do ['any'(); a];
                                      @map a in clargsclosure(id,clargs,args,body) do ['any'(); a]],
                          l0_to_abstract_ssa_code(bodyclosure(id,clargs,args,body)))]
      | else → }}}

Model

Abstract SSA implementation is IR-agnostic and therefore must be provided with some hints about the nature of this particular IR. We have to be able to tell pure intrinsics from those that might have some side effects in order to get the constant folding working. We have to provide a way to evaluate those intrinsics when all of their arguments are constant. We must tell how to create a boolean constant and how to tell if a constant is a boolean truth. All such things are forming a model which compliments abstract SSA passes.

function l0_make_model() {
   ht = mkhash();

   getconst(c) = {
     vc = (match c with 'const'(tp, v) → v);
     visit:lang0flat2 /rec/ (const: vc) {
      once const {
         number → nnumber(n)
       | string → sstring(s)
       | symbol → ssymbol(s)
       | nil → }}};

   istrueconst(c) = getconst(c);

   
   mknumconst(c) =
     with target (lang0flat2) {
       'const'('any'(), mk:const:numberlang0flat2:const:number(n)(c))};

   mkboolconst(c) =
     with target (lang0flat2) {
       'const'('any'(), if(c) mk:const:symbollang0flat2:const:symbol(s)('t')
                         else mk:const:nillang0flat2:const:nil()())};

   mkbinfun(fn) = 
     λ(args) {
        <[l;r]> = args;
         vl = getconst(l);
         vr = getconst(r);
         tmp = fn(vl, vr);
         return mknumconst(tmp) };
   mkboolfun(fn) = 
     λ(args) {
       <[l;r]> = args;
        vl = getconst(l);
        vr = getconst(r);
        tmp = fn(vl, vr);
        return mkboolconst(tmp)};

   iter [nm;c;p;efun;@rst] in [
      ['binopadd';1;1;mkbinfun( %+ );'add'];
      ['binopsub';1;1;mkbinfun( %- );'sub'];
      ['binopmul';1;1;mkbinfun( %* );'mul'];
      ['binopdiv';1;1;mkbinfun( %/ );'div'];
      
      ['binopeq';1;1;mkboolfun( λ(l,r) l==r );'eq'];
      ['binopneq';1;1;mkboolfun( λ(l,r) l!=r );'neq']
   ] do {
     cls = if (rst) car(rst) else ;
     ohashput(ht, nm, λ(tg) {
        match tg with
           'constantp' → c
         | 'purep' → p
         | 'evalfun' → efun
         | 'classify' → cls})};

   ohashput(ht, '*true?*', istrueconst);
   ohashput(ht, '*boolean-type*', 'any'());
   ohashput(ht, '*type-maker*', λ(lenv, t) {'any'()});
   ohashput(ht, '*ctype-maker*', λ(lenv, t, c) {'any'()});
   ohashput(ht, '*type-equation-maker*', λ(dst) {});

   ohashput(ht, '*get-integer-constant*', λ(tp, vl) {
      });
   return ht}

define l0_default_model = l0_make_model()

Optimisations

function l0_genssa_opt(ts)
  map t in ts do {
     t0 = l0_to_abstract_ssa([t]);
     if (t0) {
      <ht:nv> = genssa2_process(l0_default_model, mkhash(), car(t0));
       return nv}
     }

Back from the abstract SSA

function l0_from_abstract_ssa_code(varht, c)
with target (lang0flat2) {
   classvar(id) = aif(chk = ohashget(varht, id)) chk else 'var';
   getglob(v) =
     visit:lang0flat2 /rec/ (value: v) {
        once value {
           global → idglobal(id)
         | else → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())}};
   visit:genssa2 (code: c) {
      deep code: {mk:code:clang0flat2:code:c(bs)(bs)};
      deep bblock {
        b → collector(addx, getx) {
          nops = map append o in opsb(name,ops,t) do o(addx);
          nt = getx();
          mk:bblock:bblang0flat2:bblock:bb(lbl,body,t)(lbl=nameb(name,ops,t),
                       body=nops,
                       t=if(nt) car(nt) else tb(name,ops,t))}};
      deep oppair: λ(addx) {op(addx, name)};
      deep iop {
          phi → λ(addx, name)
             [mk:insn:deflang0flat2:insn:def(id,v)(name, mk:philang0flat2:expr:phi(ss)(ss = argsphi(args)))]
        | select → λ(addx, name)
             [mk:insn:deflang0flat2:insn:def(id,v)(name, mk:selectlang0flat2:expr:select(cnd,tr,fl)(cndselect(cnd,t,f), tselect(cnd,t,f), fselect(cnd,t,f)))]
        | call → λ(addx, dstname) {
             match dstcall(a,dst,args) with
                '*tailreturn*' → {
                   // Override the terminal and remove the argument
                   // from the insns list
                   addx(mk:term:returnlang0flat2:term:return(v)(car(argscall(a,dst,args))));
                   return }
              | '*funcall*' →
                   [mk:insn:deflang0flat2:insn:def(id,v)(dstname,
                                mk:applylang0flat2:expr:apply(purep,fn,args)(, car(argscall(a,dst,args)), cdr(argscall(a,dst,args))))]
              | '*mkclosure*' →
                   [mk:insn:deflang0flat2:insn:def(id,v)(dstname,
                                mk:mkclosurelang0flat2:expr:mkclosure(df,args)(getglob(car(argscall(a,dst,args))), cdr(argscall(a,dst,args))))]
              | '*return*' → {
                   addx(mk:term:returnlang0flat2:term:return(v)(car(argscall(a,dst,args))));
                   return }
              | else → { // intrinsic
                   [mk:insn:deflang0flat2:insn:def(id,v)(dstname,
                                mk:applylang0flat2:expr:apply(purep,fn,args)(, mk:globallang0flat2:value:global(id)(dstcall(a,dst,args)), argscall(a,dst,args)))]
                }}};
      deep expr {
        var → {match classvar(idvar(id)) with
                   'var' → mk:value:reglang0flat2:value:reg(id)(idvar(id))
                 | 'arg' → mk:value:arglang0flat2:value:arg(id)(idvar(id))
                 | 'clvar' → mk:value:clvarlang0flat2:value:clvar(id)(idvar(id))
                 | else → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())}
      | glob → mk:value:globallang0flat2:value:global(id)(idglob(id))
      | const → mk:value:constlang0flat2:value:const(c)(vconst(t,v))
      | other → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())};
      deep phiarg {
        a → mk:phisrc:slang0flat2:phisrc:s(from,v)(from=srca(src,v), v = va(src,v))};
      deep term {
        br  → mk:term:gotolang0flat2:term:goto(id)(dstbr(dst))
      | brc → mk:term:gotoclang0flat2:term:gotoc(cnd,tr,fl)(cbrc(c,tr,fl), trbrc(c,tr,fl), flbrc(c,tr,fl))
      | switch → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
      | none → mk:term:returnlang0flat2:term:return(v)(mk:reglang0flat2:value:reg(id)('*WTF*'))};
      }}

function l0_varht(t) {
   ht = mkhash();
   visit:lang0flat2 /rec/ (top: t) {
      once top {
         sfunction → { iter a in argssfunction(id,args,body) do ohashput(ht, a, 'arg'); }
       | closure → { iter a in argsclosure(id,clargs,args,body) do ohashput(ht, a, 'arg');
                      iter c in clargsclosure(id,clargs,args,body) do ohashput(ht, c, 'clvar')}
       | else → }};
   return ht}

function l0_do_abstract_ssa(ts) {
  getcode(t) =
    visit:genssa2 (top: t) {
       once top { f → bodyf(nm,ret,args,body) }};
  map t in ts do {
    visit:lang0flat2 /rec/ (top: t) {
     once top {
        sfunction → {
           assa = 'f'(idsfunction(id,args,body), 'any'(), map a in argssfunction(id,args,body) do ['any'(); a],
                          l0_to_abstract_ssa_code(bodysfunction(id,args,body)));
           varht = l0_varht(thisnodesrc());
          <ht:nv> = genssa2_process(l0_default_model, mkhash(), assa);
           back = l0_from_abstract_ssa_code(varht, getcode(nv));
           mk:nodelang0flat2:top:sfunction(id,args,body)
   with implicit arguments: id, args
(body = back)}
      | closure → {
           assa = 'f'(idclosure(id,clargs,args,body), 'any'(), [@map a in argsclosure(id,clargs,args,body) do ['any'(); a];
                                      @map a in clargsclosure(id,clargs,args,body) do ['any'(); a]],
                          l0_to_abstract_ssa_code(bodyclosure(id,clargs,args,body)));
           varht = l0_varht(thisnodesrc());
          <ht:nv> = genssa2_process(l0_default_model, mkhash(), assa);
           back = l0_from_abstract_ssa_code(varht, getcode(nv));
           mk:nodelang0flat2:top:closure(id,clargs,args,body)
   with implicit arguments: id, clargs, args
(body = back)}
      | else → node()}}}}

Feedback codegen backend

Since this language allows compile time macros, we need two different backends - one for emiting the final code, and another for interpreting the code that might be used by the macros. For the latter we'll implement a primitive .net backend. Partly because it's an opportunity to showcase a few patterns in compiling an SSA IR into a stack-based target. It will consist of the following steps: One might ask, why did we go through all those complex rewrite steps to get back to the expression trees again? Firstly, the expression trees are a bit different now - they do not contain any control flow inside, it's broken into basic blocks already. Secondly, we had an opportunity to do all the SSA-based stuff which is not possible on a primitive expression tree based IR. And, also, it is a bit ironic, but we're doing a redundant work here by making our IR stack-friendly - the .net engine will quickly turn it back into SSA again at a first opportunity.

Out of SSA

function l0_naive_phi_removal(c)
   collector(newvaradd, newvarsget) {
      srcbbs = mkhash();
      // 1. Collect phi variables and the values to
      //    be assigned to them.
      getphis(c) =
        visit:lang0flat2 /rec/ (code: c) {
           deep insn {  def → vdef(id,v)(iddef(id,v))
                      | else → };
           deep term {  tail → etail(e)('*tail*')
                      | else → };
           deep expr(dst) {
              phi → { newvaradd(dst);
                       iter [s;v] in ssphi(ss) do {
                          ohashput(srcbbs, s,
                            [dst;v]:ohashget(srcbbs, s))}}
            | else → };
           deep phisrc { s → [froms(from,v);vs(from,v)] }};
      getphis(c);
      // 2. Rewrite the phi origin basic blocks and
      //    add all the allocas to the 'entry' basic block.
      visit:lang0flat2 /rec/ (code: c) {
         deep bblock {
           bb → {
              pfx = if (lblbb(lbl,body,t) === 'entry') {
                 map v in newvarsget() do {
                    vnm = %Sm<<macro Sm<< args: 
Same as [(string->symbol (S<< ...))]
(v, "_alloc");
                    mk:insn:allocalang0flat2:insn:alloca(id)(vnm)}} else ;
              sfx = {
                 chk = ohashget(srcbbs, lblbb(lbl,body,t));
                 map [dst;v] in chk do
                   mk:insn:setlang0flat2:insn:set(id,v)(%Sm<<macro Sm<< args: 
Same as [(string->symbol (S<< ...))]
(dst, "_alloc"), v)};
              mk:nodelang0flat2:bblock:bb(lbl,body,t)
   with implicit arguments: lbl, t
(body = pfx  bodybb(lbl,body,t)  sfx)}};
         deep term { tail → etail(e)('*tail*') | else → node() };
         deep insn { def → mk:nodelang0flat2:insn:def(id,v)
   with implicit arguments: id
(v = vdef(id,v)(iddef(id,v))) | else → node() };
         deep expr(dst) {
            phi → mk:loadlang0flat2:expr:load(id)(%Sm<<macro Sm<< args: 
Same as [(string->symbol (S<< ...))]
(dst, "_alloc"))
          | else → node()}}}

The only cases where a registers are set to load(something) are from the previous pass here, and it is safe to demote all such loads to values.

ast lang0flat2x : lang0flat2 () recform {
  value += load(ident:id);
}

function l0_reduce_loads(c) {
   lht = mkhash();
   fill() = visit:lang0flat2 /rec/ (code: c) {
      deep expr(dst) {
         load → ohashput(lht, dst, idload(id))
       | else → };
      deep term {
         tail → etail(e)('_') | else → };
      deep insn { def → vdef(id,v)(iddef(id,v)) | else → }};
   fill();
   visit:lang0flat2 /rec, dst lang0flat2x/ (code: c) {
      deep insn { def → if (ohashget(lht, iddef(id,v)))  else [node()]
                | else → [node()]};
      deep bblock {
           bb → mk:nodelang0flat2x:bblock:bb(lbl,body,t)
   with implicit arguments: lbl, t
(body = map append b in bodybb(lbl,body,t) do b)};
      deep value {
           reg →
             aif (chk = ohashget(lht, idreg(id)))
               mk:value:loadlang0flat2x:value:load(id)(chk)
             else node()}}}

Back to trees

.NET target is a stack machine, so it makes sense to fold the sequential single use register assignments back into expression trees.

ast lang0unflat1 : lang0flat2x () recform {
  value += expr(expr:e)
         | load(ident:id)
         ;
}

function l0_is_ordered(e) {
  //TODO!!!
  return 
}

function l0_bblock_to_trees(cnt, bb)
collector(ordlistadd, ordget)
collector(useadd, useget) {
  // 1. Collect the register definitions and
  //    fill the ordered list.
  defsht = mkhash(); ordered=mkhash();
  ordadd(id) = {ordlistadd(id); ohashput(ordered,id,id)};
  fillht() = 
    visit:lang0flat2x /rec/ (bblock: bb) {
      once insn {
         def → {
            ohashput(defsht, iddef(id,v), vdef(id,v));
            if (l0_is_ordered(vdef(id,v))) ordadd(iddef(id,v));}
       | else → }};
  fillht();
  // 2. See which register references defies The Order.
  mkord() =
     visit:lang0flat2x /rec/ (bblock: bb) {
        once value {
           reg →
              if (ohashget(ordered, idreg(id)))
                 useadd(idreg(id))
         | else → }};
  mkord();
  outoford = mkhash();
  do loop(defs = ordget(), uses = useget()) {
     if (not(defs) || not(uses)) 
     else {
        d = car(defs); u = car(uses);
        if (d === u) loop(cdr(defs), cdr(uses)) else
        {
           ohashput(outoford, d, d);
           loop(cdr(defs), uses) // TODO!!! wrong!!!
        }}};
  // 3. Walk in the reverse order and substitute the single-use definitions,
  //    as long as this substitution does not break an order.
  getinsns() =
     visit:lang0flat2x /rec/ (bblock: bb) {
        deep bblock {
           bb → tbb(lbl,body,t)reversefunction reverse lst: 
Returns the reversed list.
(bodybb(lbl,body,t))};
        once term {
           tail → [mk:insn:deflang0flat2x:insn:def(id,v)('_', etail(e))]
         | gotoc → [mk:insn:deflang0flat2x:insn:def(id,v)('_', mk:valuelang0flat2x:expr:value(v)(cndgotoc(cnd,tr,fl)))]
         | return → [mk:insn:deflang0flat2x:insn:def(id,v)('_', mk:valuelang0flat2x:expr:value(v)(vreturn(v)))]
         | else → }};
  removedht = mkhash();
  removed(id) = ohashget(removedht, id);
  insns = getinsns();
  process_expr(e0) =
  do loop(e = e0) {
     visit:lang0flat2x /rec, dst lang0unflat1/ (expr: e) {
        once value {
           reg →
              if (ohashget(cnt, idreg(id)) == 1) {
                // is it defined in the same bb?
                aif (chk = ohashget(defsht, idreg(id))) {
                  if (not (ohashget(outoford, idreg(id)))) {
                    ohashput(removedht, idreg(id), idreg(id));
                    mk:value:exprlang0unflat1:value:expr(e)( loop (chk) )
                  } else node()
                } else node()
              } else node()
         | else → node()}}};
  newinsns = map append i in insns do
     visit:lang0flat2x /rec, dst lang0unflat1/ (insn:i) {
       once insn {
         def → if (removed(iddef(id,v)))  else [mk:nodelang0unflat1:insn:def(id,v)
   with implicit arguments: id
(v = process_expr(vdef(id,v)))]
       | set → [mk:nodelang0unflat1:insn:set(id,v)
   with implicit arguments: id
(v = mk:exprlang0unflat1:value:expr(e)(process_expr(mk:expr:valuelang0unflat1:expr:value(v)(vset(id,v)))))]
       | else → [node()]}};
  getexpr(i) = {
     ret = mkref();
     visit:lang0unflat1 /rec/ (insn:i) {
        once expr { else → ret := node() }};
     return ^ret};
  getvalue(i) = {
     ret = mkref();
     visit:lang0unflat1 /rec/ (insn:i) {
        once value { else → ret := node() }};
     return ^ret};
  reapplyinsns(lst) =
     visit:lang0flat2x /rec, dst lang0unflat1/ (bblock: bb) {
       deep bblock {
         bb → {
           <chgp:nt> = tbb(lbl,body,t);
            println('lbl'(lblbb(lbl,body,t), chgp));
            tmp = gen_pprint_ast(lang0unflat1, insn);
            iter x in lst do println(%S<<macro S<< args: 
A short form for [(buildstring ...)]
(":: ", tmp(x)));
            mk:nodelang0unflat1:bblock:bb(lbl,body,t)
   with implicit arguments: lbl
(body = reversefunction reverse lst: 
Returns the reversed list.
(if(chgp) cdr(lst) else lst),
                    t = nt)}};
       deep term {
         tail → true: mk:nodelang0unflat1:term:tail(e)(e = getexpr(car(lst)))
       | gotoc → true: mk:nodelang0unflat1:term:gotoc(cnd,tr,fl)
   with implicit arguments: tr, fl
(cnd = getvalue(car(lst)))
       | return → true: mk:nodelang0unflat1:term:return(v)(v = getvalue(car(lst)))
       | else → :node() }};
  return reapplyinsns(newinsns)}

function l0_bblocks_to_trees(c) {
  // 1. We can only move the single use registers. Let's count the number of
  //    uses for all the registers.
  countht = mkhash();
  count(id) = { nxt = aif(chk=ohashget(countht, id)) chk+1 else 1;
                ohashput(countht, id, nxt) };
  mkcount() =
    visit:lang0flat2x /rec/ (code: c) {
      once value { reg → count(idreg(id)) | else → }};
  mkcount();
  // 2. For each basic block we maintain the order of expressions that cannot be reordered and move everything
  //    else freely.
  mktree() =
    visit:lang0flat2x /rec, dst lang0unflat1/ (code:c) {
      once bblock { bb → l0_bblock_to_trees(countht, node()) }};
  nxt = mktree();
  // 3. Use the count table again, to remove the unused register names
  visit:lang0unflat1 /rec/ (code: nxt) {
     deep insn {
        def → if (not(ohashget(countht, iddef(id,v)))) mk:nodelang0unflat1:insn:def(id,v)
   with implicit arguments: v
(id = ) else node()
      | else → node()}}}

No more registers

Just a nominal pass by now - just demote the remaining registers to loads/stores and issue explicit allocas for them.

function l0_registers_are_not_welcome(c)
collector(addreg, getregs) {
   genregs() = 
     visit:lang0unflat1 /rec/ (code: c) {
        once insn {
           def → if(iddef(id,v)) addreg(mk:insn:allocalang0unflat1:insn:alloca(id)(iddef(id,v))) | else → }};
   genregs();

   visit:lang0unflat1 /rec/ (code: c) {
      deep bblock {
         bb → if(lblbb(lbl,body,t) === 'entry') mk:nodelang0unflat1:bblock:bb(lbl,body,t)
   with implicit arguments: lbl, t
(body = getregs()bodybb(lbl,body,t))
               else node()};
      deep value {
         reg → mk:loadlang0unflat1:value:load(id)(idreg(id))
       | else → node()};
      deep insn {
         def → if(iddef(id,v)) mk:setlang0unflat1:insn:set(id,v)(id = iddef(id,v), v = mk:exprlang0unflat1:value:expr(e)(vdef(id,v)))
                else node()
       | else → node()}}}

Environment

We managed to keep the compiler clean and self-contained up until now, but for a real backend we have to deal with a runtime library and a global environment, so this final pass will have to keep looking things up and keeping a track of classes, fields and methods it generated. Environment contains references to some core library methods (e.g., a select function), FFI methods, the already compiled classes and methods and, in another layer, fields and methods in the class that is being compiled now.

.NET codegen

The general approach is simple - for every chunk of code (a number of definitions) a class is created. Constants are stored in static fields, all functions are turned into static methods. A special 'init' method is generated to fill all the static fields. We're not using static constructors here, because some side effect actions of this init method may not touch any of this class static fields, but affect the other global variables instead, so we have to ensure the initialisation order explicitly.

function l0_feedback_dumb(env, c)
  visit:lang0unflat1 /rec/ (code: c) {
     deep bblock {
        bb → ['label'(lblbb(lbl,body,t));@map append b in bodybb(lbl,body,t) do b;@tbb(lbl,body,t)]};
     deep code { c → map append b in bsc(bs) do b };
     deep insn {
        alloca → ['local'(idalloca(id), t_object)]
      | def → if (iddef(id,v)) ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'()) else
                 [@vdef(id,v); 'Pop'()]
      | set → [@vset(id,v); 'Stloc'('var'(idset(id,v)))]};
     deep expr {
        mkclosure → [@map append a in argsmkclosure(df,args) do a; @l0_compile_mkclosure(env, dfmkclosure(df,args), length(argsmkclosure(df,args)))]
      | apply → [@fnapply(purep,fn,args); @map append a in argsapply(purep,fn,args) do a; @l0_compile_run_method(env, length(argsapply(purep,fn,args)))]
      | load → ['Ldloc'('var'(idload(id)))]
      | phi → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
      | value → vvalue(v)
      | select → [@trselect(cnd,tr,fl); @flselect(cnd,tr,fl); @cndselect(cnd,tr,fl); @l0_compile_select_function(env)]};
     deep value {
        expr → eexpr(e)
      | const → l0_compile_constant(env, cconst(c))
      | global → l0_compile_global(env, idglobal(id))
      | arg → ['Ldarg'(l0_map_argument(env, idarg(id)))]
      | clvar → ['Ldarg_0'(); 'Ldfld'('field'(l0_map_clenv(env, idclvar(id))))]
      | reg → ccerrorfunction ccerror arg: 
Raises [MBaseException] with a given argument.
('IMPOSSIBLE'())
      | load → ['Ldloc'('var'(idload(id)))]
      | funref → l0_compile_global(env, idfunref(id))};
     deep term {
        goto → ['Br'('label'(idgoto(id)))]
      | gotoc → [@cndgotoc(cnd,tr,fl); @l0_compile_cndcheck(env); 'Brc'('label'(trgotoc(cnd,tr,fl))); 'Br'('label'(flgotoc(cnd,tr,fl)))]
      | return → [@vreturn(v); 'Ret'()]
      | tail → [@etail(e); 'Ret'()]}}

Driver

function l0_net_prep(ts)
   map t in ts do {
      visit:lang0flat2 /rec, dst lang0unflat1/ (top: t) {
         once code {
           else → l0_registers_are_not_welcome(
                    l0_bblocks_to_trees(
                     l0_reduce_loads(
                      l0_naive_phi_removal(node())))) }}}

function l0_net_backend(env, ts) {
   x1 = l0_net_prep(ts);
   tmp = gen_pprint_ast(lang0unflat1, top);
   iter x in x1 do println(tmp(x));
   println("--------------------");
   x2 = map x in x1 do
          visit:lang0unflat1 /rec/ (top: x) {
             once code {
               c → { tmp1 = l0_feedback_dumb(env, node());
                      iter t in tmp1 do println(t);
                      println("========="); }}};
   return ts;
}

Top level driver

Bringing all the passes together here: G cluster_Z591433 l0_genssa_driver cluster_Z591434 l0_flatten_driver cluster_Z591435 l0_lowering_driver Z591420 l0_do_abstract_ssa EXIT END Z591420->EXIT START START Z591431 l0_expand_dummy START->Z591431 Z591421 l0_genssa Z591421->Z591420 Z591422 l0_basic_blocks Z591422->Z591421 Z591423 l0_flatten_cfg Z591423->Z591422 Z591424 l0_flatten Z591424->Z591423 Z591425 l0_lower_splitexprs Z591425->Z591424 Z591426 l0_split_statements Z591426->Z591425 Z591427 l0_lowering_driver Z591427->Z591426 Z591428 l0_eliminate_list Z591428->Z591427 Z591429 l0_eliminate_if2 Z591429->Z591428 Z591430 l0_eliminate_apply2 Z591430->Z591429 Z591431->Z591430

function l0_compiler_driver(src) {
  x1 = l0_genssa_driver(src);
  x2 = l0_do_abstract_ssa(x1);
  return x2
}