There is a lot of tutorials out there on how to write a simple interpreter or a compiler. There is most
certainly no need in just another one of this kind. This tutorial is the opposite, it tells how to write a
complex compiler with all the bells and whistles, while still leaving the unimportant details out and
keeping things easy to understand.
As a source language we're using a very simple, somewhat functional language. It's eager, expression--oriented,
no explicit destructive assignment to variables, functions may have I/O and memory side effects
(using some impure intrinsics). No explicit looping constructs besides recursion, tail recursion optimisation
is guaranteed by the compiler.
We build a compiler on top of this simple language, and then extend the core language with more features
added as syntax extensions and compile-time macros, demonstrating a number of different important
approaches used together.
There is no intention to write just another one short and dense compiler. We're fully embracing
the nanopass-style of compiler construction, therefore code is deliberately verbose, but on the other hand,
every step made in the compilation pipeline is easy to understand and easy to reason about.
This compiler takes its eager functional source language and reduces it to an imperative
low level intermediate representation, so both functional and imperative compilation techniques
are covered here, with more emphasis on the imperative side (e.g., we do not go the CPS way,
using an SSA representation instead).
Naming languages is hard and does not make much sense, so we'll simply call our source language lang0,
with all the consequent IR names derived from this notation.
Topics covered in this tutorial include (but not limited to):
This is also a showcase for a range of new MBase features - recform ASTs support, abstract SSA library,
AST pretty printing, etc.
Initial AST
Language starts with a definition of its Abstract Syntax Tree. This is the very first AST in our compilation
chain, it is generated by the parser frontend directly and contains some syntax sugar that ought to be
eliminated straight away.
Since this is a "functional" (sort of) language, everything is an expression. We will split statements and
expressions further down the compilation pipeline.
astlang0srcrecform{ // Toplevel statements - definitions and raw expressions top=define(ident:id,expr:e) |defmacro(ident:id,expr:e) |expr(expr:e) |debug(.*any:es) ; // All possible expressions expr=lambda(*ident:args,expr:body) |seq(.*expr:body)
// This one does not come from a parser, but is // produced by a lexical scoping pass later on. |elambda(*ident:args,lenv:benv,lenv:fenv,expr:body)
// And the following is for macros |toplift(top:t) |quasiquote(expr:e) |unquote(expr:e)// only valid inside a quasiquote expression ; // All possible constant literals const=number(any:n) |string(any:s) |symbol(any:s) |nil() ; // A let binding node letpair=p(ident:id,expr:v);
ident=unquote(expr:e) |v(vident:id) ; }
Very often we will have to define a new AST which is only slighly different from
another one. It does not make sense to copy-paste the same thing over and over
again. Instead, new ASTs should be rewritten from the existing ones.
The language after macro expansion will be the following:
As we have defined the structure of this language, we can start building a parser for it.
Parsers in PFront are based on an extended PEG. Normally, a new language
parser will inherit one of the existing parsers, to avoid defining common stuff
like whitespace, comments, identifiers, number and string literals and so on. In
this case we're inheriting PFront language entirely, though only use few
primitive nodes from it.
Let's not dwell on it, ok? Far too many compiler tutorials are stuck on
parsing, and parsing is the least important part of a language implementation.
parserplang0(pfront){ // A target AST produced by this parser targetlang0src; // Which node should be used as a whitespace - inherited from PFront !!Spaces;
// Rules - for syntax highlighting and for separating keywords // from identifiers [lexical:] ⇐ [lexical] ⇒ {ctoken=lexic}; [keyword:] ⇐ [keyword]![IdentRest] ⇒ {ctoken=keyword};
// Top entry node - returns a list of lang0 toplevel statements. // Note the trailing whitespaces - it is important to add them to a // top node, since the whitespaces are only implicitly ignored at // beginnings of tokens. plang0 ⇐ eslist<[l0topexpr]>:ts[Spaces]* ⇒ ts;
Logically, this is the next step straight after parsing. Though, it is a
relatively advanced subject that probably should be skipped at first, straight
to the syntax sugar lowering. Feel free to come back and
review this section after reading about the runtime-compiler feedback loop.
At this moment we don't have any feedback loop, it can only be defined later,
when the rest of the compiler pipeline is finalised. We'll leave an empty stub
here meanwhile.
Unfortunately, macros imply a very eary exposure to the concept of environment,
which, again, we'd prefer to leave for much later down the pipeline. We'll keep
a dummy stub here instead.
functionl0_env_find_macro(globenv,fn){ getid(fn)= visit:lang0src/rec/(expr:fn){ deepexpr{ var → idvar(id) |else → ∅}; deepident{ v → idv(id)|else → ∅}}; (^l0_env_find_macro_hook)(globenv,getid(fn))}
In order to be able to implement macros we need an access to AST constructing
functions. The easiest way of abstracting them out nicely is to use
quasiquotation: any AST inside the quasiquotation is translated into an
AST-constructing code that calls the appropriate runtime library functions.
Expression expansion pass includes handling the quasiquote nodes (consider
quasiquote a special kind of a "macro") and user-defined macros applied via the
compiler feedback loop.
Now all the quasiquotations, macro applications and top lift nodes are eliminated and we can turn our AST
into a form suitable for the rest of the compiler pipeline.
functionl0_postexpand_sanitise(ts) maptintsdo visit:lang0src/rec,dstlang0/(top:t){ deepexpr{ toplift → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'()) |quasiquote → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'()) |unquote → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'()) |else → node()}; deepident{ unquote → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'()) |v → idv(id)}}
For the compiler pipeline demo purposes we need a dummy environment:
Our first AST is designed for parsing, but it is not very useful for any further analysis, optimisations and
a code generation. Before we start actually compiling anything we have to eliminate some syntax sugar and make this
AST simpler.
Eliminating syntax sugar
The two argument application node was only introduced to simplify binary expressions parsing and must go.
This is our first introduction to visitors - a fundamental building block in PFront. Visitor walks
over an AST in a defined order and builds a new AST, transforming certain nodes with given expressions. In this
case we only rewrite apply2 variant of an expr node, with all the boilerplate covering all possible
paths to expr nodes being inferred automatically.
Variant constructors like mk:apply are context-dependent, in this visitor we know that the resulting
AST is the same as the original AST, so it constructs lang0:expr:apply variants.
This function takes a list of toplevel statements as an argument, and returns a list of transformed toplevel
statements.
functionl0_eliminate_apply2(ts) // 'ts' is a list of top statements, // we'll visit them one by one maptintsdo // A visitor block: // * The source AST is 'lang0', destination is the same // * It's a recform AST (not an old list-based one) // * The entry point is a 'top' node visit:lang0/rec/(top:t){ // All the 'expr' nodes are visited in a depth-first order deepexpr{ // All the 'apply2' variants of 'expr' are rewritten // as 'apply' variants apply2 → mk:applylang0:expr:apply(fn,args)(fn=mk:varlang0:expr:var(id)(fnapply2(fn,L,R)),args=[Lapply2(fn,L,R);Rapply2(fn,L,R)]) // All the other variants will remain the same |else → node()}}
A two-armed if does not make sense in an expression--based language and it must go. An implicit false branch
returns a nil constant.
And a list constructor is just a sequence of nested cons applications. We do not care if it's entirely
constant at this stage, all the optimisations can be done at a later stage.
This is a lexically scoped language, allowing new definitions to override an outer scope. This is a default
design decision for languages with pattern matching, for example. Our core language does not have any pattern
matching support, but it can be added later with macros and syntax extensions.
The simplest approach to lexical scope implementation is to just rename all the variables to make them unique.
In a consequent pass we'll collect variable and argument declarations and sort everything into categories -
global variables, local variables and lambda arguments. Another category to appear later (after a lambda
lifting pass) is a closure variable.
// A helper function, returns a new name if it is in the current // environment or id unchanged (meaning it's a global name) renamevar(venv,id)= doiloop(e=venv){ matchewith [(iidwheniid===id);nnm]:tl → nnm |hd:tl → iloop(tl) |else → id};
// A helper function to access an identifier bound // in a let pair getpident(p)=visit:lang0/rec/(letpair:p){ onceletpair{p → idp(id,v)}};
// A helper function: loop into a right hand side // of a let pair and rename a bound identifier loopps(ne,p)=visit:lang0/rec/(letpair:p){ deepletpair{ p → mk:nodelang0:letpair:p(id,v)(id=renamevar(ne,idp(id,v)), v=loop(vp(id,v),env))}}; // Main visitor: for lambda and let nodes it // stops and recurses into their sub-nodes explicitly, // while all the other variants are traversed in the depth-first // order. visit:lang0/rec/(expr:e){ onceexpr{ lambda → { newenv=mapainargslambda(args,body)do[a;gensymfunction gensym :
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.()]; addrenames(newenv); mk:elambdalang0:expr:elambda(args,benv,fenv,body)(args=map[a;b]innewenvdob, benv=env, fenv=∅, body=loop(bodylambda(args,body),newenv⊕env))} |let → { newenv=mappinpslet(ps,body)do[getpident(p);gensymfunction gensym :
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.()]; addrenames(newenv); mk:nodelang0:expr:let(ps,body)(ps=mappinpslet(ps,body)doloopps(newenv,p), body=loop(bodylet(ps,body),newenv⊕env))} |deep → {var → mk:nodelang0:expr:var(id)(id=renamevar(env,idvar(id))) |else → node()}}}}
It would be nice to keep a track of all the renames we did, it could help
with producing debug information and with giving error messages more context.
Now it's time to stop using the initial AST and move on. We eliminated few variants, and are going to introduce
more specific variable variants: global, local, argument or a closure variable. We're also introducing variants
for a consequent lambda lifting.
The new AST is derived from the previous one (lang0), with few modifications
made to the expr node.
Now we can make our variables more specific, and since all the names are unique now we can do it easily in
a depth--first order. Of course we could do it in the previous pass, but it's better to keep passes as simple
as possible.
functionl0_classify_vars_top(t){ dict=mkhash(); // Fill a hash table with origins of all the definitions filldict()= visit:lang0/rec/(top:t){ deepexpr{ elambda → iterainargselambda(args,benv,fenv,body)doohashput(dict,a,'arg') |else → ∅}; deepletpair{ p → ohashput(dict,idp(id,v),'local')}}; filldict();
// Replace all the var nodes with specific versions // and use this opportunity to mark lambda bound variables lists with // nearly correct origins. We're still missing closure variables, they will // appear later. visit:lang0/rec,dstlang0i/(top:t){ deepexpr{ elambda → mk:nodelang0i:expr:elambda(args,benv,fenv,body)
with implicit arguments: args, fenv, body (benv= map[o;n]inbenvelambda(args,benv,fenv,body)do [n;ohashget(dict,n)]) |var → aif(ct=ohashget(dict,idvar(id))){ if(ct==='arg') mk:arglang0i:expr:arg(id)(idvar(id)) elseif(ct==='local') mk:locallang0i:expr:local(id)(idvar(id)) elseccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('WHAT?'(ct)) }elsemk:globallang0i:expr:global(id)(idvar(id)) |else → node()}}}
Lambda lifting pass will turn all the nested lambda functions into toplevel definitions, capturing closure
environments where necessary. We did the lexical scoping pass previously, and now all the names inside any
given context are unique, which makes it easy to produce lists of externally defined variables used under each
lambda context.
First we have to propagate the free variables lists (with bound variables already collected in the
lexical scoping pass), from top to bottom.
functionl0_lambda_lift_freevars(e){ // A helper function: returns a list of all // identifiers used in a given expression (bound or not) collectvars(e)=collector(addv,getvs){ visit:lang0i/rec/(expr:e){ onceexpr{ elambda → {itervinfenvelambda(args,benv,fenv,body)doaddv(v)} |deep → { local → addv(idlocal(id)) |arg → addv(idarg(id)) |else → ∅}}}; returngetvs()}; // Add free variable lists to all the lambda nodes, // in a depth--first order. visit:lang0i/rec/(expr:e){ deepexpr{ elambda → mk:nodelang0i:expr:elambda(args,benv,fenv,body)
with implicit arguments: args, benv, body (fenv=collectvars(bodyelambda(args,benv,fenv,body))) |else → node()}}}
Now we can use an intersection between a bound variables list and
a free variable list to form a closure environment.
functionl0_lambda_lift_expr(e,clenv,adddef) // An entry for nested lambdas, to keep a track of variables // that were turned into closure variables in a lifted context. doloop(e=e,renv=∅){ // A helper function, generating a list of // captured closure variables out of bound and free variables lists getclenv(benv,fenv)=withtarget(lang0i){ ht=mkhash(); iternminfenvdoohashput(ht,nm,nm); mapappend[nm;cl]inbenvdo{ if(ohashget(ht,nm)){ if(memq(nm,renv))[[nm;mk:expr:clvarlang0i:expr:clvar(id)(nm)]] elseif(cl==='local')[[nm;mk:expr:locallang0i:expr:local(id)(nm)]] elseif(cl==='arg')[[nm;mk:expr:arglang0i:expr:arg(id)(nm)]] elseccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('WHAT?'(cl)) }else∅}}; // A main visitor, lifting all the lambda nodes into new // toplevel definitions and replacing them with closure allocations, // even if no closure environment is captured (we'll deal with this // case later). Also using this opportunity to make closure variable // references specific. visit:lang0i/rec/(expr:e){ onceexpr{ elambda → { intersect=getclenv(benvelambda(args,benv,fenv,body),fenvelambda(args,benv,fenv,body)); clargs=map[nm;cl]inintersectdonm; newlambda=mk:clambdalang0i:expr:clambda(clenv,args,body)(clenv=clargs, args=argselambda(args,benv,fenv,body), body=loop(bodyelambda(args,benv,fenv,body),clargs⊕renv)); newnm=gensymfunction gensym :
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.(); adddef(mk:top:definelang0i:top:define(id,e)(newnm,newlambda)); returnmk:mkclosurelang0i:expr:mkclosure(df,args)(newnm,map[nm;cl]inintersectdocl)} |local → if(memq(idlocal(id),renv))mk:clvarlang0i:expr:clvar(id)(idlocal(id))elsenode() |arg → if(memq(idarg(id),renv))mk:clvarlang0i:expr:clvar(id)(idarg(id))elsenode() |deep → {else → node()}}}}
Obviously we do not need to lift lambda expressions that are already on top,
so we have to make an exception for this case.
Lambda--lifting a single top level statement may produce any number of new top level statements, so
we'll just collect all the new statements together into one flat list.
By now we moved all the lambdas into top level definitions, made closure
environments explicit, eliminated lexical scope and reduced some syntax sugar.
Let's refine our AST further. We can classify top level statements as
functions (i.e., lambdas with an empty closure environment), closures,
constants, evaluated definitions and, finally, toplevel expressions.
We also eliminated the temporary elambda variant previously, and are
going to get rid of nested let bindings and replace them with flat
imperative assignments. Same AST is introducing a special tail call variant.
Another variant introduced here is splitexpr, it is required for
separating expressions and statements further down the pipeline.
Instead of our nice and clean functional variable binding we'll produce an
imperative sequence of destructive assignments here. Some other passes will also
introduce true destructive assignments, e.g., lifting the statement expressions
and optimising direct tail recursion, so we're not losing any purity here. A
consequent SSA transform will make our IR clean again.
At this stage, IR is suitable for the tail call analysis, which could have
been more difficult previously and will be more clumsy later. Results of this
marking will be used further down in a direct tail recursion optimisation and
are required for the statement vs. expression split, since we're also
introducing explicit return statements here.
A procedure very similar to tail call marking, but for the statement
expressions (like if and seq yielding value that is used inside
another expression). We'll need it later on when we actually split statements
and expressions.
The following function, just like a tail call marker pass before, will follow
the returning branches of expressions only --- i.e., last entry in a sequence
and both arms of an if, but instead of inserting a return or a
tail call marker it will assign that value to a variable or wrap it into a
drop marker, meaning that the result is going to be discarded anyway.
The next function is identifying statement--expressions and treats them in a
special way: any value returned from the non--terminal sequence elements is
dropped, and any value of a complex statement--expression that is actually used
is assigned to a variable, and a special splitexpr variant is created,
holding both the statement that generates the value and an expression to access
this value. After the split between statements and expressions is completed
those inner statements will be lifted to their nearest statement scopes.
Some tail calls are tail recursive, and this is a right moment to replace
them with a goto. The language does not have gotos and labels? Not a
problem at all, we'll make a new language now:
If a function is directly tail recursive, we have to create temporary
variables that will initially contain copies of the argument values and then
will be used to pass arguments in a tail--recursive call. In this case all the
argument references in the function body must be replaced with local variable
references.
functionl0_tailrec_topexpr(self,fnargs,e){ trecp=mkref(∅); isself(e)= visit:lang0k/rec/(expr:e){ onceexpr{ global → idglobal(id)===self |else → ∅}}; rewriteargs(e)= visit:lang0k/rec/(expr:e){ deepexpr{ arg → mk:locallang0k:expr:local(id)(idarg(id)) |else → node()}}; body0=visit:lang0j/rec,dstlang0k/(expr:e){ deepexpr{ tailapply → if(isself(fntailapply(fn,args))){ trecp:=true; mk:seqlang0k:expr:seq(body)(body=[@map[a;v]inzipfunction zip a, b:
Returns the list of ($a_i${} $b_i$) for all elements of [a] and [b].(fnargs,argstailapply(fn,args))domk:setlang0k:expr:set(id,v)(a,v); mk:gotolang0k:expr:goto(id)('tailrecentry')]) }elsenode() |else → node()}}; if(^trecp)withtarget(lang0k){ mk:expr:seqlang0k:expr:seq(body)(body=[@mapainfnargsdomk:allocalang0k:expr:alloca(id)(a); @mapainfnargsdomk:setlang0k:expr:set(id,v)(a,mk:arglang0k:expr:arg(id)(a)); mk:gotolang0k:expr:goto(id)('tailrecentry'); mk:labellang0k:expr:label(id)('tailrecentry'); rewriteargs(body0)]) }elsebody0}
functionl0_tailrec(ts) maptintsdo visit:lang0j/rec,dstlang0k/(top:t){ oncetop{ sfunction → mk:nodelang0k:top:sfunction(id,args,body)
with implicit arguments: id, args (body=l0_tailrec_topexpr(idsfunction(id,args,body),argssfunction(id,args,body),bodysfunction(id,args,body))) |deep → {else → node()}}; onceexpr{ // We still have to do this pass even if it's not a function, // because we're moving to another AST version here. else → l0_tailrec_topexpr('*dummy*',∅,node())}}
By now our source functional language got transformed into an imperative one,
with explicitly allocated variables, destructive assignments and properly marked
tail calls. Nested lambda functions and lexical scope were eliminated. There is
already a clear separation between statements and expressions that occured
naturally during the previous passes. Goto, seq, drop, alloca, set, if, return
and tailapply are statements, while everything else is an expression.
If our target was a stack machine, it would have been possible to keep an
expression-based IR all the way down. Unfortunately, stack machines are not
quite fit for optimisations and analysis, and we're not going down the CPS route
either.
In an SSA-based IR we have to *assign* results to named registers. Control
flow statements must be habdled differently from the expressions yielding
results, and for this reason we're starting this split here.
It is an important compiler construction pattern that occurs even in the
languages that had distinct statements from the beginning.
We can formalise this split now. The new AST lang0k defines everything
that returns a value as an expresssion and everything control flow as a
statement. An if3 statement does not return anything any longer, we
ensured this by assigning its return value to a variable previously, in a
preparation phase.
The following pass dows not change the tree structure (besides renaming the
toplevel expr to eval), it merely sorts out the variants into their new
nodes.
With a clean and simple low level imperative IR we can now turn to all the
different set of compiler construction tools. First we'll lower this IR into an
SSA form and apply all the standard SSA-based torture techniques to it.
The first step in flattening will produce almost the same IR, with all the
complex expressions broken down to register assignments. We could have used
local variable assignments here, but since we're going into an SSA soon anyway
it does not matter if "registers" are introduced a bit earlier.
Note this lock node here - it is merely an optimisation, to ensure the
visitor is not going into an already flattened code.
Getting ready to descend to SSA. This involves adding explicit load and store
instructions for the local stack--allocated variables, so we'll lift local
references alongside with compound expressions here.
Now we can get rid of the compound statements, replacing them with a
goto and labels. At the moment we only have one kind of a compound
statement - an if.
The previous passes could produce nested seq blocks. Since this is the
only compound statement left by now we can just flatten everything into a single
topmost sequence.
Now everything is perfectly flat, so we can sort instructions into basic
blocks. Also it's a right moment to do another node split - separate compound
expressions and leaf values, it will help to enforce the SSA guarantee of not
reassigning simple values (with the only exception of the $\varphi$ nodes, but
this will be sorted out later).
Note that tailapply is gone now as it served its purpose already,
replaced with an expr node - it will be useful later when we have to
annotate calls.
First pass over a flat code is to eliminate redundancy: e.g., if there is a
sequence of labels, only one should remain (and be referred to), because a basic
block can only have one name. If a terminal instruction is following another
terminal instruction, it is unreachable and must be eliminated.
Once this filtering is done, rules for accumulating instructions into basic
blocks are much simpler.
functionl0_seq_body(s) visit:lang0flat1/rec/(stmt:s){ oncestmt{seq → bodyseq(body)|else → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())}}
Now we're all set for an SSA--transform (i.e., memory to register). For this
we'll use a specific simplified IR, not to be confused with a more complex
Abstract SSA we'll use later.
functionl0_to_genssa(c) collector(liftadd,liftget) collector(addalloca,getallocas){ vals=mkhash(); addval(nm,v)=ohashput(vals,nm,v); newlift(v)= symbols(id){addval(id,v);liftadd([id;'use'()]);id}; getexitsterm(t)= collector(add,get){ visit:lang0flat2/rec/(term:t){ oncelabident:add(thisnodesrc())}; returnget()}; getbbexits(bb)=collector(add,get){ visit:lang0flat2/rec/(bblock:bb){ onceterm{else → add(getexitsterm(node()))}}; returncar(get())}; getcode(t)= visit:lang0flat2/rec/(term:t){ onceterm{ tail → symbols(retn)[[retn;etail(e)]] |else → ∅}}; gs=visit:lang0flat2/rec/(code:c){ deepcode{c → bsc(bs)}; deepbblock{ bb → 'b'(lblbb(lbl,body,t),liftget()⊕(mapappendbinbodybb(lbl,body,t)dob)⊕getcode(tbb(lbl,body,t)),getbbexits(thisnodesrc()))}; deepinsn{ alloca → {addalloca(idalloca(id));∅} |def → [[iddef(id,v);vdef(id,v)]] |set → [[gensymfunction gensym :
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.();'store'(idset(id,v),vset(id,v))]] |drop → ∅}; deepexpr{ load → 'load'(idload(id)) |apply → 'use'(fnapply(purep,fn,args),@argsapply(purep,fn,args)) |mkclosure → 'use'(@argsmkclosure(df,args))}; deepvalue{ reg → idreg(id) |else → newlift(node())}}; allocas=getallocas(); // Use the library function to run mem2reg pass on a generic SSA: nssa=ssa_transform(gs,allocas); returnvals:nssa}
Back from the generic SSA
After the generic SSA transform, we have all allocas, loads and stores
eliminated, new phi nodes introduced and equivalent registers renamed. Now
we have to apply the results back to our original AST.
// Generic SSA transform collected register renames, we // have to follow them here. Also, lifted constant values were // given temporary register names in the previous transform, // we have to recover them back. getrenamed0(n)={ doloop(x=ohashget(vmap,n),p=n){ if(x)returnloop(ohashget(vmap,x),x) elsereturnp}}; getrenamed(n)=withtarget(lang0flat2){ nreg=getrenamed0(n); aif(chk=ohashget(valsht,nreg)){ chk }elsemk:value:reglang0flat2:value:reg(id)(nreg)};
isrenamed(n)=ohashget(vmap,n);
// Now we can collect the phi nodes from the generic SSA representation, // and then add them to their corresponding source basic blocks. phis=mkhash(); fillphis()= visit:genssa(code:ngen){ deepbblock{b → iteropsdoops(nameb(name,ops,nexts))}; deepoppair:λ(bb){op(bb,name)}; deepiop{ phi → λ(bb,tgt)withtarget(lang0flat2){ nphi= mk:insn:deflang0flat2:insn:def(id,v)(tgt, mk:philang0flat2:expr:phi(ss)(ss= map[f;v]inzipfunction zip a, b:
Returns the list of ($a_i${} $b_i$) for all elements of [a] and [b].(prevsphi(orig,prevs,vals),valsphi(orig,prevs,vals))do mk:slang0flat2:phisrc:s(from,v)(f,getrenamed(v)))); ohashput(phis,bb, ohashget(phis,bb)⊕[nphi])} |else → λ(bb,tgt)∅}}; fillphis(); getphis(lbl)=ohashget(phis,lbl);
// Apply all the changes to the original lang0flat2 code visit:lang0flat2/rec/(code:c){ deepbblock{ bb → { phis=getphis(lblbb(lbl,body,t)); mk:nodelang0flat2:bblock:bb(lbl,body,t)
with implicit arguments: lbl, t (body=phis⊕mapappendbinbodybb(lbl,body,t)dob) }}; deepinsn{ alloca → ∅ |set → ∅ |def → if(isrenamed(iddef(id,v)))∅else[node()] |else → [node()]}; deepvalue{ reg → getrenamed(idreg(id)) |else → node()}}}
And a simple wrapper function to run the generic SSA transform over top level entries:
And a control flow graph for this example is following:
Abstract SSA
The next few sections are entirely optional. We already have an IR suitable
for code generation, but also fit for SSA-based optimisations. MBase provides an
abstract SSA library which, with a help of a user--defined IR model, provide,
among others, the following transforms:
Some of these optimisations are enabling and may be useful for our code
generation even if the target platform is capable of doing the similar
optimisations on a lower level.
Keep in mind that Abstract SSA is an old, list--form AST, and by converting
our nice recform AST into it we're discarding any associated metadata (most
importantly, source code locations). Also, some semantics is lost and must be
recovered on a way back (e.g., both arguments and closure variables are
represented as registers).
We did not have any select instructions before, but they may appear
after the Abstract SSA optimisations. Backend must be aware of this.
functionl0_to_abstract_ssa_code(c){ isintrinsicfn(id)={ case(id){ 'binopadd'|'binopsub'|'binopmul' |'binopdiv' |'binopeq'|'binopneq' → id |else → ∅ }}; isintrinsic(f)= {matchfwith 'glob'(id) → isintrinsicfn(id) |else → ∅}; visit:lang0flat2/rec/(code:c){ deepcode{c → bsc(bs)}; deepbblock{ bb → { <te:t1>=tbb(lbl,body,t); 'b'(lblbb(lbl,body,t),bodybb(lbl,body,t)⊕te,t1)}}; deepinsn{ def → [iddef(id,v);vdef(id,v)] |else → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'())}; deepterm{ goto → ∅:'br'(idgoto(id)) |gotoc → ∅:'brc'(cndgotoc(cnd,tr,fl),trgotoc(cnd,tr,fl),flgotoc(cnd,tr,fl)) |return → [[gensymfunction gensym :
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.();'call'(['other'('src'(thisnodesrc()))],'*return*',vreturn(v))]]:'none'() |tail → symbols(ret) [[ret;etail(e)]; [gensymfunction gensym :
Returns a unique symbol every time it is called.
Uniqueness is guaranteed within one run only.();'call'(['other'('src'(thisnodesrc()))],'*tailreturn*','var'(ret))]]: 'none'()}; deepexpr{ mkclosure → 'call'(['other'('src'(thisnodesrc()))],'*mkclosure*','glob'(dfmkclosure(df,args)),@argsmkclosure(df,args)) |apply → aif(intr=isintrinsic(fnapply(purep,fn,args))){ 'call'(['other'('src'(thisnodesrc()))], intr,@argsapply(purep,fn,args)) }else'call'(['other'('src'(thisnodesrc()))],'*funcall*',fnapply(purep,fn,args),@argsapply(purep,fn,args)) |load → ccerrorfunction ccerror arg:
Raises [MBaseException] with a given argument.('IMPOSSIBLE'()) |phi → 'phi'(@ssphi(ss)) |select → 'select'(cndselect(cnd,tr,fl),trselect(cnd,tr,fl),flselect(cnd,tr,fl))}; deepphisrc{ s → 'a'(froms(from,v),vs(from,v))}; deepvalue{ const → 'const'('any'(),cconst(c)) |global → 'glob'(idglobal(id)) |arg → 'var'(idarg(id)) |clvar → 'var'(idclvar(id)) |reg → 'var'(idreg(id)) |funref → 'glob'(idfunref(id))}}}
Abstract SSA implementation is IR-agnostic and therefore must be provided
with some hints about the nature of this particular IR. We have to be able to
tell pure intrinsics from those that might have some side effects in order to
get the constant folding working. We have to provide a way to evaluate those
intrinsics when all of their arguments are constant. We must tell how to create
a boolean constant and how to tell if a constant is a boolean truth. All such
things are forming a model which compliments abstract SSA passes.
Since this language allows compile time macros, we need two different backends -
one for emiting the final code, and another for interpreting the code that might
be used by the macros.
For the latter we'll implement a primitive .net backend. Partly because it's an
opportunity to showcase a few patterns in compiling an SSA IR into a stack-based
target. It will consist of the following steps:
Out of SSA: we'll simply eliminate all the phi nodes and replace them with
loads and stores placed accordingly.
Back to trees: collapse the register assignment sequences into expression
trees where possible without affecting the effect order.
Register elimination: all the register assignments left after the previous
pass must now be converted into local variables (with explicit loads and
stores).
Stack machine codegen.
One might ask, why did we go through all those complex rewrite steps to get back
to the expression trees again? Firstly, the expression trees are a bit different
now - they do not contain any control flow inside, it's broken into basic blocks
already. Secondly, we had an opportunity to do all the SSA-based stuff which is
not possible on a primitive expression tree based IR.
And, also, it is a bit ironic, but we're doing a redundant work here by making
our IR stack-friendly - the .net engine will quickly turn it back into SSA again
at a first opportunity.
Out of SSA
functionl0_naive_phi_removal(c) collector(newvaradd,newvarsget){ srcbbs=mkhash(); // 1. Collect phi variables and the values to // be assigned to them. getphis(c)= visit:lang0flat2/rec/(code:c){ deepinsn{def → vdef(id,v)(iddef(id,v)) |else → ∅}; deepterm{tail → etail(e)('*tail*') |else → ∅}; deepexpr(dst){ phi → {newvaradd(dst); iter[s;v]inssphi(ss)do{ ohashput(srcbbs,s, [dst;v]:ohashget(srcbbs,s))}} |else → ∅}; deepphisrc{s → [froms(from,v);vs(from,v)]}}; getphis(c); // 2. Rewrite the phi origin basic blocks and // add all the allocas to the 'entry' basic block. visit:lang0flat2/rec/(code:c){ deepbblock{ bb → { pfx=if(lblbb(lbl,body,t)==='entry'){ mapvinnewvarsget()do{ vnm=%Sm<<macro Sm<< args:
Same as [(string->symbol (S<< ...))](v,"_alloc"); mk:insn:allocalang0flat2:insn:alloca(id)(vnm)}}else∅; sfx={ chk=ohashget(srcbbs,lblbb(lbl,body,t)); map[dst;v]inchkdo mk:insn:setlang0flat2:insn:set(id,v)(%Sm<<macro Sm<< args:
Same as [(string->symbol (S<< ...))](dst,"_alloc"),v)}; mk:nodelang0flat2:bblock:bb(lbl,body,t)
with implicit arguments: lbl, t (body=pfx⊕bodybb(lbl,body,t)⊕sfx)}}; deepterm{tail → etail(e)('*tail*')|else → node()}; deepinsn{def → mk:nodelang0flat2:insn:def(id,v)
with implicit arguments: id (v=vdef(id,v)(iddef(id,v)))|else → node()}; deepexpr(dst){ phi → mk:loadlang0flat2:expr:load(id)(%Sm<<macro Sm<< args:
Same as [(string->symbol (S<< ...))](dst,"_alloc")) |else → node()}}}
The only cases where a registers are set to load(something) are from the
previous pass here, and it is safe to demote all such loads to values.
functionl0_bblock_to_trees(cnt,bb) collector(ordlistadd,ordget) collector(useadd,useget){ // 1. Collect the register definitions and // fill the ordered list. defsht=mkhash();ordered=mkhash(); ordadd(id)={ordlistadd(id);ohashput(ordered,id,id)}; fillht()= visit:lang0flat2x/rec/(bblock:bb){ onceinsn{ def → { ohashput(defsht,iddef(id,v),vdef(id,v)); if(l0_is_ordered(vdef(id,v)))ordadd(iddef(id,v));} |else → ∅}}; fillht(); // 2. See which register references defies The Order. mkord()= visit:lang0flat2x/rec/(bblock:bb){ oncevalue{ reg → if(ohashget(ordered,idreg(id))) useadd(idreg(id)) |else → ∅}}; mkord(); outoford=mkhash(); doloop(defs=ordget(),uses=useget()){ if(not(defs) || not(uses))∅ else{ d=car(defs);u=car(uses); if(d===u)loop(cdr(defs),cdr(uses))else { ohashput(outoford,d,d); loop(cdr(defs),uses)// TODO!!! wrong!!! }}}; // 3. Walk in the reverse order and substitute the single-use definitions, // as long as this substitution does not break an order. getinsns()= visit:lang0flat2x/rec/(bblock:bb){ deepbblock{ bb → tbb(lbl,body,t)⊕reversefunction reverse lst:
Returns the reversed list.(bodybb(lbl,body,t))}; onceterm{ tail → [mk:insn:deflang0flat2x:insn:def(id,v)('_',etail(e))] |gotoc → [mk:insn:deflang0flat2x:insn:def(id,v)('_',mk:valuelang0flat2x:expr:value(v)(cndgotoc(cnd,tr,fl)))] |return → [mk:insn:deflang0flat2x:insn:def(id,v)('_',mk:valuelang0flat2x:expr:value(v)(vreturn(v)))] |else → ∅}}; removedht=mkhash(); removed(id)=ohashget(removedht,id); insns=getinsns(); process_expr(e0)= doloop(e=e0){ visit:lang0flat2x/rec,dstlang0unflat1/(expr:e){ oncevalue{ reg → if(ohashget(cnt,idreg(id)) == 1){ // is it defined in the same bb? aif(chk=ohashget(defsht,idreg(id))){ if(not(ohashget(outoford,idreg(id)))){ ohashput(removedht,idreg(id),idreg(id)); mk:value:exprlang0unflat1:value:expr(e)(loop(chk)) }elsenode() }elsenode() }elsenode() |else → node()}}}; newinsns=mapappendiininsnsdo visit:lang0flat2x/rec,dstlang0unflat1/(insn:i){ onceinsn{ def → if(removed(iddef(id,v)))∅else[mk:nodelang0unflat1:insn:def(id,v)
with implicit arguments: id (v=process_expr(vdef(id,v)))] |set → [mk:nodelang0unflat1:insn:set(id,v)
with implicit arguments: id (v=mk:exprlang0unflat1:value:expr(e)(process_expr(mk:expr:valuelang0unflat1:expr:value(v)(vset(id,v)))))] |else → [node()]}}; getexpr(i)={ ret=mkref(); visit:lang0unflat1/rec/(insn:i){ onceexpr{else → ret:=node()}}; return^ret}; getvalue(i)={ ret=mkref(); visit:lang0unflat1/rec/(insn:i){ oncevalue{else → ret:=node()}}; return^ret}; reapplyinsns(lst)= visit:lang0flat2x/rec,dstlang0unflat1/(bblock:bb){ deepbblock{ bb → { <chgp:nt>=tbb(lbl,body,t); println('lbl'(lblbb(lbl,body,t),chgp)); tmp=gen_pprint_ast(lang0unflat1,insn); iterxinlstdoprintln(%S<<macro S<< args:
A short form for [(buildstring ...)](":: ",tmp(x))); mk:nodelang0unflat1:bblock:bb(lbl,body,t)
with implicit arguments: lbl (body=reversefunction reverse lst:
Returns the reversed list.(if(chgp)cdr(lst)elselst), t=nt)}}; deepterm{ tail → true:mk:nodelang0unflat1:term:tail(e)(e=getexpr(car(lst))) |gotoc → true:mk:nodelang0unflat1:term:gotoc(cnd,tr,fl)
with implicit arguments: tr, fl (cnd=getvalue(car(lst))) |return → true:mk:nodelang0unflat1:term:return(v)(v=getvalue(car(lst))) |else → ∅:node()}}; returnreapplyinsns(newinsns)}
functionl0_bblocks_to_trees(c){ // 1. We can only move the single use registers. Let's count the number of // uses for all the registers. countht=mkhash(); count(id)={nxt=aif(chk=ohashget(countht,id))chk+1else1; ohashput(countht,id,nxt)}; mkcount()= visit:lang0flat2x/rec/(code:c){ oncevalue{reg → count(idreg(id))|else → ∅}}; mkcount(); // 2. For each basic block we maintain the order of expressions that cannot be reordered and move everything // else freely. mktree()= visit:lang0flat2x/rec,dstlang0unflat1/(code:c){ oncebblock{bb → l0_bblock_to_trees(countht,node())}}; nxt=mktree(); // 3. Use the count table again, to remove the unused register names visit:lang0unflat1/rec/(code:nxt){ deepinsn{ def → if(not(ohashget(countht,iddef(id,v))))mk:nodelang0unflat1:insn:def(id,v)
with implicit arguments: v (id=∅)elsenode() |else → node()}}}
No more registers
Just a nominal pass by now - just demote the remaining registers to loads/stores
and issue explicit allocas for them.
We managed to keep the compiler clean and self-contained up until now, but for a
real backend we have to deal with a runtime library and a global environment, so
this final pass will have to keep looking things up and keeping a track of
classes, fields and methods it generated.
Environment contains references to some core library methods (e.g., a select
function), FFI methods, the already compiled classes and methods and, in another
layer, fields and methods in the class that is being compiled now.
.NET codegen
The general approach is simple - for every chunk of code (a number of
definitions) a class is created. Constants are stored in static fields, all
functions are turned into static methods. A special 'init' method is generated
to fill all the static fields. We're not using static constructors here, because
some side effect actions of this init method may not touch any of this class
static fields, but affect the other global variables instead, so we have to
ensure the initialisation order explicitly.