The directory $TIGER is /cs/cs132/src/tiger
on turing.
$TIGER/chap11; these contain
a complete implementation of the code for Chapter 10, which creates an
interference graph for a sequence of assembly instructions.
Implement the Color and RegAlloc structures,
as shown on Page 253 in Appel. Your code need not handle coalescing
or spilling.
Because of this, it may be easier to implement optimistic coloring
from scratch --- based on the presentation given in class --- rather
than trying to simplify the pseudocode in Chapter 11.
When you
implement the simplify phase (picking an order to remove nodes from
the interference graph) you should not actually modify the graph.
(You'll need the original graph back for the actual coloring phase.)
One possibility --- though certainly not the only one --- is to keep a
list of the nodes not yet removed from the graph, and a mapping
telling for each node in the graph how many of its neighbors have not
yet been removed. There is an implementation of such lookup graphs in
the structure Graph.Table.
Appel's suggestion that we add an intruction that uses all
the special registers and the callee-save registers doesn't work
in this case, for a couple reasons.
It is therefore probably easiest to create two lists:
SparcFrame.registers
which contains all of the Sparc registers (which can be used as
the values to map Temporaries 0..31 to hardware registers) and
another list, something like
SparcFrame.useableRegisters
which contains the subset of Sparc registers to which you would
be willing to allocate temporaries to (e.g., not the stack pointer
or the zero register)
All that's left to do after the register allocator is working is cleanup:
Finish the procedure entry/exit routines as specified on
pages 208 and 260--262. For procEntryExit3 you may
assume that every function needs space in its stack frame for 6
outgoing arguments (i.e., that no function passes more than 6
arguments to its callees). This function should probably
add the save and ret/restore
instructions, and an initial .global (function-label) assembler
pseudo-op to make sure the function's label is visible to other files.
Thus, the save instruction in the prolog must reserve
92 bytes + (number of escaping_locals * 4 bytes),
making sure to keep the stack 8-byte aligned and also remembering that
the argument to save should be negative since the stack grows
downwards. (92 = 6 outgoing args (minimum permitted) * 4 bytes
+ 64 bytes for register-window spilling
+ 4 bytes for (unused) pointer for functions returning
structures.)
The Sparc register windows take care of callee-save registers for us automatically, so there's no point in starting the routine by moving the callee-save registers into temporaries, and finishing by moving them back.
Next, the Frame.stringneeds to emit assembly code
for string data. The output could look something like
.section ".data" (the-label): .word (the-length) .ascii (the-string-in-quotes) .align 4 .section ".text"[Actually, since strings are read-only, there's little point in putting them in the data segment, rather than leaving them in the read-only text segment with the code. But, the above a perfectly good example of how to allocate in the data segment.]
For testing purposes, the function Translate.getResult
should probably reset the list of fragments back to nil;
otherwise fragments from one test can appear in the output of future
tests.
More hints on getting code running (some of which I probably should have mentioned for the previous assignment; sorry):
runtime.c that may
require modifications:
getchar may conflict with the standard function
getchar; the simplest fix is to rename the function to something like
getchr and change env.sml appropriately.
runtime.c appropriately.
Env.base_venv are
called using the standard calling convention rather than the
external calling convention. Thus, these functions all need to be
given an extra (dummy) initial static link argument (as in the
inttostring example below).
.align 4 directive
so that the next thing emitted falls on a word boundary.
compute argument 0 into temporary t0
move t0 into %o0
compute argument 1 into temporary t1
move t1 into %o1
...etc...
call function.
And this is fine unless, for example, while computing argument 1
the code writes to %o0.
Now the only reason this should occur is if the computation of
argument 1 required a function call, and the code supplied by Appel
explicitly contains a phase that gets rid of such nested function
calls, so why is there a problem?
The answer is that multiplication and division are implemented on
the SPARC by procedure call, so if argument 1 involves a
product or quotient the above code isn't going to work. The
canonicalizing code doesn't know that these arithmetic operations
are really function calls, because on other architectures they
won't be.
We could modify canonicalize to be no-longer machine independent,
but probably a better solution is to generate the following code
instead:
compute argument 0 into temporary t0
compute argument 1 into temporary t1
...etc...
move t0 into %o0
move t1 into %o1
...etc...
call function.
This applies not only to the arguments of normal procedure calls,
but also to the operands of multiplication and division.
runtime.c can make debugging
easier.
By adding "inttostring" to the initial environment in env.sml, you
can then write test programs that print out integers without
requiring that your write the integer-to-string conversion in Tiger.
(The first parameter is an unused static link.)
struct string *inttostring(void* sl, int a)
{
struct string *t = (struct string *)malloc(sizeof(int)+21);
sprintf(t->chars, "%d", a);
t->length = strlen(t->chars);
return t;
}
Submit your code as chapter 12.
canon.sml from
$TIGER/chap8 and the files in $TIGER/chap9.
This assignment has two parts: implementing the function
SparcGen.codegen (see the signature CODEGEN
is given on page 207 of Appel), and extending the Frame
signature and definition as specified on pages 208-209.
The function SparcFrame.string : Tree.label * string -> string
is described on page 262 of Appel (see also the figure on page 260).
It emits the SPARC assembly language to emit the given label and
the given string in the SparcFrame.string. For this assignment,
write a function that returns any string you want. (This will
be fixed in a later assignment.)
The code in $TIGER/chap9/main.sml also assumes that your
semant.sml has a function transProg which
creates a new level for the main routine (with label
"tigmain"), invokes transExp, passes it to
Translate.procEntryExit to make a fragment for the main
program, and finally calls Translate.getResult to
return all the fragments.
Be careful with MOVE. Normally a T.TEMP or
a T.MEM appearing in an expression stands for the value
in the specified location. However, when it appears on the
left-hand-side of a MOVE it is the location itself that
is being referenced. So the right way to generate code for
MOVE(e1,e2) is not to generate code that starts by
computing the values of e1 and e2.
The dst and src parts of an
Assem.instr are exactly the def and use sets seen in class.
$TIGER/chap7 with a copy
of your code. Of particular interest are the provided
sparcframe.sml, frame.sig and the skeleton
signature and implementation for Translate.
You may change any of the files however you like (e.g., rewrite code,
change the definitions of types, etc.), but anything machine-specific
should go in the sparcframe.sml file, and anything
related to generic generation of intermediate-code should go in the
translate.sml file.
semant.sml to keep track of the
current nesting level as it goes along. This will mean that functions
like transExp and transDec will need to take a
new argument of type Translate.level, and you will need
to create new sub-levels via Translate.newLevel for each
function definition.
For each function, there is supposed to be a new level. This level
must be associated with the function in the environment, and
be also be passed in when translating the body of the function
and when calling procEntryExit.
This can be slightly awkward if your code to extend the environment
is completely separate from the code that compiles the function bodies.
One solution: Create the new levels when extending the
environment; look up the level in the
environment when doing the actual translation
and pass the function's level to transExp and procEntryExit.
semant.sml to call
Translate.allocLocal for every local variable declaration
it comes
across(i.e., for every var declaration and every
for-loop index.). You do not call allocLocal
on function arguments, though; you can get the access information for
formal parameters by calling Translate.formalsenv.sig.sml and env.sml to keep
track of the access and level information for all variables and functions,
as shown on Page 141 of Appel. Also, generate a unique Temp.label for each
function and store this in the environment as well.semant.sml to return not only the
type of the code, but the corresponding intermediate representation.Translate module and its signature that actually do the
translations; each one should generating the intermediate code for a
particular language construct given the intermediate code for all
of the sub-expressions.frame.
(You can use SparcFrame.formals to get the access information for
formal parameters including the static link, which will be at the
head of the list.)Translate.procEntryExit and stored as a "fragment".parse.sml file if necessary if your changes
make it stop typechecking.SparcFrame or
Translate modules, or anywhere else.
Implement the module Semant, as specified on pages 121-122 on Chapter
5. Be sure
to read the chapter and familiarize yourself with $TIGER/chap5/types.sml before
starting;
the Tiger type system is a bit different.
The only thing one can do with an SML value of type unit ref (other
than assign it the unit value --- which it already contains --- or to access
this unit value) is to compare it with another unit ref for equality. Thus,
one
can (and Appel's
supplied code does) use unit refs as an easy way to get "unique
stamps" to distinguish between structurally identical types. One could just
as well use a global counter to genereate unique type ids; the unit ref hack
is arguably simpler.
PLUS:
Implement the module FindEscape as discussed on pages
138--139 of the text. Your module must implement the function
findEscape: Absyn.exp -> unit, but the names
and types of any helper functions you define need not exactly
match Appel's suggestions.
This should all be submitted as assignment 5.
$TIGER/chap4. The text
of the chapter contains some example abstract syntax that might be informative.
Remember to group together adjacent type or function definitions into a single abstract syntax declaration.
You'll also need to change the %nonterm line to declare that
your nonterminals now have interesting values associated with them rather than
values of type unit.
For example, if you have a nonterminal exp then you'll need to
say exp of Absyn.exp
.
%name, for example.)
$TIGER/chap3/tiger.lex.sml does
not appear to understand comments: it sees /* as DIVIDE
TIMES. So either delete your copy of this tiger.lex.sml
file and plug in your own lexer (modified as on page 82 of the text!) or else
remove all comments from your test inputs.
$TIGER/testcases,
but you will almost certainly have to devise your own small tests. If a test
fails to parse, it can be useful to simplify it as much as possible while
retaining the error; this is particularly easy given that the resulting code
need not typecheck or make sense, so long as it is grammatically correct.
(For example, any expression can be replaced by 3, and so on.)
program. This is probably not an error (unless you think you've
finished writing the grammar) but know that these rules are being discarded
rather than checked for conflicts.
nil, requires break
to appear only within loops, requires that loop bodies not have a value, etc.
These semantic constraints will all be checked later and should not affect
your grammar.
int and string are just predefined identifiers
and will not appear in the grammar.
exp opn exp where opn
expands to one of the arithmetic or comparison operators. Resist the temptation.
(It interferes with resolving precedence of operations.)
do, then, else,
the assignment operator := and so on. (For example, the parser
needs to know when seeing while e do e + e whether this is a
while loop added to an expression, or whether the loop body is
e+e. This can be a little tricky; I accidentally made then
and else left associative with the same precedence, and this
caused the parser to group if e1 then e2 else e3 as (if
e1 then e2) else e3 which gets rejected.
tiger.grm.desc file contains lines like reduce by
rule 28. This refers to the 29th production rule occuring in the input
(the first rule is rule 0!). It is much easier to find rule 28
if you set up the grammar with one line per rule and no blank lines. Then
just put the cursor on the first rule and go down 28 lines. (In emacs you
can do this by ctrl-u 2 8 followed by the downarrow or ctrl-n.)
%nodefault in your grammar file, the .desc
file will explicitly list every decision for every lookahead token in every
state. This doesn't actually give you any more information, but I found it
slightly easier to think about.
lvalue -> ID" and "lvalue
-> lvalue LBRACK exp RBRACK" among others. Since I also had "exp
-> lvalue" I expected an input like "x[0]" to correctly
parse as an expression (hence as a program). But it didn't until I added the
seemingly redundant production "lvalue -> ID LBRACK exp RBRACK"
It turns out that there was a relatively serious shift/reduce conflict
involving the production "exp -> ID LBRACK exp RBRACK OF exp".
Consider the input "x[3]+1". The right thing to do is to shift
the ID, reduce it to an lvalue, shift LBRACK,
shift INT and reduce it to an exp, shift RBRACK,
and then reduce "lvalue LBRACK exp RBRACK" to an lvalue
and then to an exp. Now consider the input "x[3] of 4". Here the right thing
to do is to shift ID, shift LBRACK, shift INT
and reduce it to an exp, shift RBRACK and OF,
shift the second INT and reduce it to an exp,
and finally reduce "ID LBRACK exp RBRACK OF exp" to an exp.
Note that in the first case we must reduce the identifier x
to an lvalue and the second case we must not. This is a serious
problem because we have to make the decision having seen only the identifier
and the left bracket, before the two cases can be distinguished.
By default, in the case of a shift-reduce conflict the parser generator
always chooses to shift. Half the time this will be correct. The other half
you end up with "ID LBRACK exp RBRACK" on the stack with a
lookahead other than OF, and the parser has no way to reduce
this; it is not the literal right-hand-side of any production rule. This
is why adding the new production "lvalue -> ID LBRACK exp RBRACK"
is necessary: it lets the parser reduce the array access to an lvalue even
though it made the "wrong" choice earlier by not reducing the
identifier.
So in summary, you will probably need to add a rule like "lvalue
-> ID LBRACK exp RBRACK" even though this looks redundant. This is
in fact the redundant rule mentioned by Appel. (Of course if your parser
is arranged very differently than mine, you might not need this rule.)
tiger.grm.desc file:
error: state 13: shift/reduce conflict (shift LBRACK, reduce by rule 18)
state 13:
exp : ID . LPAREN exps RPAREN
exp : ID . LBRACE efields RBRACE
exp : ID . LBRACK exp RBRACK OF exp
lvalue : ID . (reduce by rule 18)
lvalue : ID . LBRACK exp RBRACK
EOF reduce by rule 18
COMMA reduce by rule 18
SEMICOLON reduce by rule 18
LPAREN shift 38
RPAREN reduce by rule 18
LBRACK shift 37
RBRACK reduce by rule 18
. error
This reports that a certain state (#13) in the DFA, containing the given items,
has a shift/reduce conflict. In this example, the conflict occurs when the
parser has just seen an identifier [we know this because of where the dots
are in the items] and the lookahead is a left bracket [we know this because
of the error message]. The parser can either shift the bracket and keep going
(hoping to get something like ID[exp] OF exp) or immediately
reduce using rule 18, lvalue : ID.
The lines following the items show what the parser has decided to do for
this state. If the lookahed is EOF or COMMA or
SEMICOLON then it will reduce the ID to an LVALUE
(justified by rule 18). If the lookahead is LPAREN it will
shift the LPAREN onto the stack and go to state 38, and so
on. If the lookahead doesn't occur in this list, the parser will take the
default action (in this case, report an error.) Note that in the problem
case of LBRACK, the parser has decided to shift the bracket,
which as discussed above is probably what I wanted.
$TIGER/chap2.
The given tiger.lex nearly empty, and will need a lot of work to
fill in. One possible order would be:
yytext.INITIAL must be declared
in the second (Lex-Definitions) portion of the input. The format is a line
starting with %s and then a space-separated list of states terminated
with a semicolon.) Detect unclosed comments.Extend the lexer to handle string literals. This is tricky because the lexer is required to return strings with all escape sequences have been expanded away.
One method is to maintain a global buffer. The lexer can extend this buffer
each time it recognizes a piece of the string and can call itself recursively
via continue() to get the remaining pieces, until it reaches
the end of the string. At this point the entire string can be returned.
This technique requires adding at least one more state to the lexer.
An alternate method is simply to match the entire string as a single token, and then write a separate translating function through which the lexer can pass the source text. This simplifies the lexer's organization, but requires you to implement your own code to detect escape sequences.
$TIGER/testcases to try
once you believe your lexer is working, though you will likely need to create
your own files for testing a partially-implemented lexer. Submit your code with
the following command line: cs132submit tiger.lex
$TIGER/chap1/slp.sml
with the following definitions:
maxargs : stm -> int.table.lookup : table * id -> int and update :
table * id * int -> table.interpStm : stm * table -> table and interpExp
: exp * table -> int * tableinterp : stm -> unit.Submit your code with the following command line: cs132submit slp.sml