All assignments are taken from the PROGRAM sections of Appel's book.
The book refers to files in the directory $TIGER, which corresponds to
the directory /cs/cs132/src/tiger on Turing.
$TIGER/chap11; these contain
a complete implementation of the code for Chapter 10, which creates an
interference graph for a sequence of assembly instructions (I hope).
Implement the Color and RegAlloc structures,
as shown on Page 253 in Appel. Your code need not handle coalescing
or spilling. \
Because of this, it may be easier to implement optimistic coloring
from scratch --- based on the presentation given in class --- rather
than trying to simplify the pseudocode in Chapter 11.
When you
implement the simplify phase (picking an order to remove nodes from
the interference graph) you should not actually modify the graph.
(You'll need the original graph back for the actual coloring phase.)
Instead, it should suffice to keep a list of the nodes not yet removed
from the graph, and a mapping telling for each node in the graph how
many of its neighbors have not yet been removed. There is an
implementation of such lookup graphs in the structure
Graph.Table.
There's very little left to do after the register allocator is
working: finish up the procedure entry/exit routines as specified on
pages 208 and 260--262. For procEntryExit3 you may
assume that every function needs space in its stack frame for 6
outgoing arguments (i.e., that no function passes more than 6
arguments to its callees). Make Translate.procEntryExit
call all these routines in sequence.
Note: For testing purposes, the function
Translate.getResult should probably reset the list of
fragments back to nil; otherwise fragments from one test can appear
in the output of future tests.
Submit your code as chapter 12.
canon.sml from $TIGER/chap8 and the files in $TIGER/chap9.
This assignment has two parts: implementing the structure SparcGen
(whose signature CODEGEN is given on page 207 of Appel),
and extending the Frame signature and definition
as specified on pages 208-209.
The function SparcFrame.string : Tree.label * string -> string
is described on page 262 of Appel (see also the figure on page 260).
It emits the SPARC assembly language to emit the given label and
the given string in the SparcFrame.string. For this assignment,
write a function that returns any string you want. (This will
be fixed in a later assignment.)
The code in $TIGER/chap9/main.sml also assumes that your
semant.sml has a function transProg which
creates a new level for the main routine, invokes
transExp, passes it to
Translate.procEntryExit to make a fragment for the main
program, and finally calls Translate.getResult.
Be careful with MOVE. Normally a T.TEMP or
a T.MEM appearing in an expression stands for the value
in the specified location. However, when it appears on the
left-hand-side of a MOVE it is the location itself that
is being referenced. So the right way to generate code for
MOVE(e1,e2) is not to generate code that starts by
computing the values of e1 and e2.
$TIGER/chap7, and plug in a working
tiger.lex, tiger.grm, and a FindEscape
module. (If you do not have a working FindEscape module,
comment out the corresponding call in parse.sml).
You may change any of the files however you like (e.g., rewrite code,
change the definitions of types, etc.), but anything machine-specific
should go in the sparcframe.sml file, and anything
related to actual generation of intermediate-code should go in the
translate.sml file.
semant.sml to keep track of the
current nesting level as it goes along. This will mean that functions
like transExp and transDec will need to take a
new argument of type Translate.level, and you will need
to create new sub-levels via Translate.newLevel for each
function definition.semant.sml to call
Translate.allocLocal for every local variable it comes
across(i.e., for every var declaration and every
for-loop index.). You do not call allocLocal
on function arguments, though; you can get the access information for
formal parameters by calling Translate.formals
env.sig.sml and env.sml to keep
track of the access and level information for all variables and functions,
as shown on Page 141 of Appel. Also, generate a unique Temp.label for each
function and store this in the environment as well.
semant.sml to return not only the
type of the code, but the corresponding intermediate representation.
You will have to change the return type of the translation functions.
Feel free to rewrite bits of the typechecker if the organzation makes
this difficult.
Translate module and its signature that actually do the
translations; each one should generating the intermediate code for a
particular language construct given the intermediate code for all
of the sub-expressions.
frame.
(You can use SparcFrame.formals to get the access information for
formal parameters including the static link, which will be at the
head of the list.)
Translate.procEntryExit.
parse.sml file if the calling convention
for the typechecker changes.
SparcFrame or
Translate modules, or anywhere else.
FindEscape as discussed on pages
138--139 of the text. Your module must implement the function
findEscape: Absyn.exp -> unit. However, the names
and types of any helper functions you define need not exactly
match Appel's suggestions.When submitting your code, record this as assignment 6, not 5!
$TIGER/chap4. The text of the chapter contains some
example abstract syntax that it might be helpful to look at.
Remember to group together adjacent type or function definitions into a single abstract syntax declaration.
You'll also need to change the
%nonterm line to declare that the nonterminals now are
returning interesting values rather than unit. For example, if you
have a nonterminal exp then you'll need to say exp
of Absyn.exp
%name, for example.)
$TIGER/chap3/tiger.lex.sml
does not appear to understand comments: it sees /* as
DIVIDE TIMES. So either delete your copy of this
tiger.lex.sml file and plug in your own lexer (modified
as on page 82 of the text!) or else remove all comments from your tests
inputs.
$TIGER/testcases, but you will almost certainly have to
devise your own small tests. If a test fails to parse, it can be
useful to simplify it as much as possible while retaining the error;
this is particularly easy given that the resulting code need not
typecheck or make sense, so long as it is grammatically correct. (For
example, any expression can be replaced by 3, and so on.)
%nodefault in your grammar file, the
.desc file will explicitly list every decision for every
lookahead token in every state. This doesn't actually give you any
more information, but I found it slightly easier to think about.
program. This is probably not an
error (unless you think you've finished writing the grammar) but
know that these rules are being discarded rather than checked for
conflicts.
int and string are just predefined
identifiers and will not appear in the grammar.
exp opn exp where
opn expands to one of the arithmetic or comparison operators.
Resist the temptation. (It interferes with resolving precedence of operations.)
do,
then, else, the assignment operator
:= and so on. (For example, the parser needs to know when
seeing while e do e + e whether this is a
while loop added to an expression, or whether the loop
body is e+e. This can be a little tricky; I accidentally
made then and else left associative with the
same precedence, and this caused the parser to group if e1 then
e2 else e3 as (if e1 then e2) else e3.
It turns out that there was a relatively serious shift/reduce conflict involving the production "exp -> ID LBRACK exp RBRACK OF exp". Consider the input "x[3]+1". The right thing to do is to shift the ID, reduce it to an lvalue, shift LBRACK, shift INT and reduce it to an exp, shift RBRACK, and then reduce "lvalue LBRACK exp RBRACK" to an lvalue and then to an exp. Now consider the input "x[3] of 4". Here the right thing to do is to shift ID, shift LBRACK, shift INT and reduce it to an exp, shift RBRACK and OF, shift the second INT and reduce it to an exp, and finally reduce "ID LBRACK exp RBRACK OF exp" to an exp.
Note that in the first case we must reduce the identifier x to an lvalue and the second case we must not. This is a serious problem because we have to make the decision having seen only the identifier and the left bracket, before the two cases can be distinguished.
By default, in the case of a shift-reduce conflict the parser generator always chooses to shift. Half the time this will be correct. The other half you end up with "ID LBRACK exp RBRACK" on the stack with a lookahead other than OF, and the parser has no way to reduce this; it is not the literal right-hand-side of any production rule. This is why adding the new production "lvalue -> ID LBRACK exp RBRACK" is necessary: it lets the parser reduce the array access to an lvalue even though it did the wrong thing earlier by not reducing the identifier.
So in summary, you will probably need to add a rule like "lvalue -> ID LBRACK exp RBRACK" even though this looks redundant. (This is in fact the redundant rule mentioned by Appel.) (Of course if your parser is arranged very differently than mine, you might not.)
tiger.grm.desc file contains lines like
reduce by rule 28. This refers to the 29th production
rule occuring in the input (the first rule is rule 0!).
It is much easier to find rule 28
if you set up the grammar with one line per rule and no blank lines.
Then just put the cursor on the first rule and go down 28 lines. (In
emacs you can do this by ctrl-u 2 8 followed by the downarrow
or ctrl-n.)
%nodefault in your grammar file, the
.desc file will explicitly list every decision for every
lookahead token in every state. This doesn't actually give you any
more information, but I found it slightly easier to think about.
program. This is probably not an
error (unless you think you've finished writing the grammar) but
know that these rules are being discarded rather than checked for
conflicts.
tiger.grm.desc file:
error: state 13: shift/reduce conflict (shift LBRACK, reduce by rule 18)
state 13:
exp : ID . LPAREN exps RPAREN
exp : ID . LBRACE efields RBRACE
exp : ID . LBRACK exp RBRACK OF exp
lvalue : ID . (reduce by rule 18)
lvalue : ID . LBRACK exp RBRACK
EOF reduce by rule 18
COMMA reduce by rule 18
SEMICOLON reduce by rule 18
LPAREN shift 38
RPAREN reduce by rule 18
LBRACK shift 37
RBRACK reduce by rule 18
. error
This reports that a certain state (#13) in the DFA, containing the given
items, has a shift/reduce conflict. In this example, the
conflict occurs when the parser has just seen an identifier [we know this
because of where the dots are in the items] and the lookahead is
a left bracket [we know this because of the error message]. The parser
can either shift the bracket and keep going (hoping to get something like
ID[exp] OF exp) or immediately reduce the identifier to an lvalue.
The lines following the items show what the parser has decided to do for this state. If the lookahed is EOF or COMMA or SEMICOLON then it will reduce the ID to an LVALUE (justified by rule 18). If the lookahead is LPAREN it will shift the LPAREN onto the stack and go to state 38, and so on. If the lookahead doesn't occur in this list, the parser will take the default action (in this case, report an error.) Note that in the problem case of LBRACK, the parser has decided to shift the bracket, which is probably what we want.
$TIGER/chap2. The
given tiger.lex nearly empty, and will need a lot of work to fill in.
It is suggested (though not required) that you proceed in the following order:
yytext.INITIAL must be declared
in the second (Lex-Definitions) portion of the input. The format is a line starting with
%s and then a space-separated list of states terminated with a semicolon.)
Detect unclosed comments.Extend the lexer to handle string literals. This is tricky because the lexer is required to return strings with all escape sequences have been expanded away.
One method is to maintain
a global buffer. The lexer can extend this buffer
each time it recognizes a piece of the string and can call itself recursively via continue()
to get the remaining pieces, until it reaches the end of the string.
At this point the entire string can be returned. This technique requires adding at least one more
state to the lexer.
An alternate method is simply to match the entire string as a single token, and then write a separate translating function through which the lexer can pass the source text. This simplifies the lexer's organization, but requires you to implement your own code to detect escape sequences.
$TIGER/samples to try once
you believe your lexer is working, though you will likely need to create your own
files for testing a partially-implemented lexer.
Submit your code with the following command line: cs132submit tiger.lex
$TIGER/chap1/slp.sml
with the following definitions:
maxargs : stm -> int.table.lookup : table * id -> int and
update : table * id * int -> table.interpStm : stm * table -> table and
interpExp : exp * table -> int * tableinterp : stm -> unit.cs132submit slp.sml