Assignments

CS 132: Compiler Design
Spring 2001

All assignments are taken from the PROGRAM sections of Appel's book. The book refers to files in the directory $TIGER, which corresponds to the directory /cs/cs132/src/tiger on Turing.

Chapters 11 and 12

Get copies of all the files in $TIGER/chap11; these contain a complete implementation of the code for Chapter 10, which creates an interference graph for a sequence of assembly instructions (I hope).

Implement the Color and RegAlloc structures, as shown on Page 253 in Appel. Your code need not handle coalescing or spilling. \ Because of this, it may be easier to implement optimistic coloring from scratch --- based on the presentation given in class --- rather than trying to simplify the pseudocode in Chapter 11.

When you implement the simplify phase (picking an order to remove nodes from the interference graph) you should not actually modify the graph. (You'll need the original graph back for the actual coloring phase.) Instead, it should suffice to keep a list of the nodes not yet removed from the graph, and a mapping telling for each node in the graph how many of its neighbors have not yet been removed. There is an implementation of such lookup graphs in the structure Graph.Table.

There's very little left to do after the register allocator is working: finish up the procedure entry/exit routines as specified on pages 208 and 260--262. For procEntryExit3 you may assume that every function needs space in its stack frame for 6 outgoing arguments (i.e., that no function passes more than 6 arguments to its callees). Make Translate.procEntryExit call all these routines in sequence.

Note: For testing purposes, the function Translate.getResult should probably reset the list of fragments back to nil; otherwise fragments from one test can appear in the output of future tests.

Submit your code as chapter 12.

Chapter 9
Due April 4, 2001 [Firm]

Read Chapter 9, and get copies of canon.sml from $TIGER/chap8 and the files in $TIGER/chap9. This assignment has two parts: implementing the structure SparcGen (whose signature CODEGEN is given on page 207 of Appel), and extending the Frame signature and definition as specified on pages 208-209.

The function SparcFrame.string : Tree.label * string -> string is described on page 262 of Appel (see also the figure on page 260). It emits the SPARC assembly language to emit the given label and the given string in the SparcFrame.string. For this assignment, write a function that returns any string you want. (This will be fixed in a later assignment.)

The code in $TIGER/chap9/main.sml also assumes that your semant.sml has a function transProg which creates a new level for the main routine, invokes transExp, passes it to Translate.procEntryExit to make a fragment for the main program, and finally calls Translate.getResult.

Be careful with MOVE. Normally a T.TEMP or a T.MEM appearing in an expression stands for the value in the specified location. However, when it appears on the left-hand-side of a MOVE it is the location itself that is being referenced. So the right way to generate code for MOVE(e1,e2) is not to generate code that starts by computing the values of e1 and e2.

Chapter 7
Due March 28, 2001

First, make sure you have read Chapters 5, 6, and 7 in Appel. Copy the files from $TIGER/chap7, and plug in a working tiger.lex, tiger.grm, and a FindEscape module. (If you do not have a working FindEscape module, comment out the corresponding call in parse.sml). You may change any of the files however you like (e.g., rewrite code, change the definitions of types, etc.), but anything machine-specific should go in the sparcframe.sml file, and anything related to actual generation of intermediate-code should go in the translate.sml file.

Implement the remaining parts of the PROGRAM section of chapter 6:
- Extend semant.sml to keep track of the current nesting level as it goes along. This will mean that functions like transExp and transDec will need to take a new argument of type Translate.level, and you will need to create new sub-levels via Translate.newLevel for each function definition.
- Extend semant.sml to call Translate.allocLocal for every local variable it comes across(i.e., for every var declaration and every for-loop index.). You do not call allocLocal on function arguments, though; you can get the access information for formal parameters by calling Translate.formals
- Extend env.sig.sml and env.sml to keep track of the access and level information for all variables and functions, as shown on Page 141 of Appel. Also, generate a unique Temp.label for each function and store this in the environment as well.
- Fix any major bugs in the typechecker that you come across.
Implement the remaining parts of the PROGRAM section of chapter 7:
- Extend the code in semant.sml to return not only the type of the code, but the corresponding intermediate representation. You will have to change the return type of the translation functions. Feel free to rewrite bits of the typechecker if the organzation makes this difficult.
- Follow Appel's organization: add lots of helper functions to the Translate module and its signature that actually do the translations; each one should generating the intermediate code for a particular language construct given the intermediate code for all of the sub-expressions.
- A big part of the translation is simply figuring out how to get the right static link for an arbitrary variable. Remember that the static link is always the first formal parameter in a frame. (You can use SparcFrame.formals to get the access information for formal parameters including the static link, which will be at the head of the list.)
- Recall that function definitions return no code; they are passed to Translate.procEntryExit.
- Fix the parse.sml file if the calling convention for the typechecker changes.
- Fix any bugs that you come across in the SparcFrame or Translate modules, or anywhere else.

Submit any files to which you have made changes.

Chapter 6
Due February 21, 2001

Implement the module FindEscape as discussed on pages 138--139 of the text. Your module must implement the function findEscape: Absyn.exp -> unit. However, the names and types of any helper functions you define need not exactly match Appel's suggestions.

When submitting your code, record this as assignment 6, not 5!

Chapter 4
Due February 14, 2001

Adapt your parser to return abstract syntax trees, as specified on pages 100-101 in Chapter 4. You will need the files in $TIGER/chap4. The text of the chapter contains some example abstract syntax that it might be helpful to look at.

Remember to group together adjacent type or function definitions into a single abstract syntax declaration.

You'll also need to change the %nonterm line to declare that the nonterminals now are returning interesting values rather than unit. For example, if you have a nonterminal exp then you'll need to say exp of Absyn.exp

Chapter 3
Due February 7, 2001

For reference you might want to look at the ML-Yacc documentation. However, it does not appear particularly helpful, so you should look closely at the examples in the book. (Be aware, however, the the book examples tend to be incomplete; every parser requires a %name, for example.)

Recall that unlike ML-Lex, the ML-Yacc program doesn't expect a semicolon for each production rule in the grammar.
The lexer provided for you in $TIGER/chap3/tiger.lex.sml does not appear to understand comments: it sees /* as DIVIDE TIMES. So either delete your copy of this tiger.lex.sml file and plug in your own lexer (modified as on page 82 of the text!) or else remove all comments from your tests inputs.
There are a number of small pieces of tiger code in the directory $TIGER/testcases, but you will almost certainly have to devise your own small tests. If a test fails to parse, it can be useful to simplify it as much as possible while retaining the error; this is particularly easy given that the resulting code need not typecheck or make sense, so long as it is grammatically correct. (For example, any expression can be replaced by 3, and so on.)
If you put %nodefault in your grammar file, the .desc file will explicitly list every decision for every lookahead token in every state. This doesn't actually give you any more information, but I found it slightly easier to think about.
Try compiling the grammar after every couple production rules you write, and fix as many conflicts as you can. Rules reported as "not reduced" means that you wrote the rule but that nonterminal is not used anywhere (or that the nonterminals that do use it are themselves not used, or ...) That is, there's no way to generate that nonterminal by starting from the start state program. This is probably not an error (unless you think you've finished writing the grammar) but know that these rules are being discarded rather than checked for conflicts.
The description of the Tiger language distinguishes between type identifiers and ordinary identifiers, requires variable definitions to have type annotations when they are defined to be nil, requires break to appear only within loops, requires that loop bodies not have a value, etc. These are all constraints that will be checked later and should not affect your grammar.
The types int and string are just predefined identifiers and will not appear in the grammar.
Make careful that your grammar for record types only allows commas to appear between field specifications, and that no comma is required if there is one or fewer specifications in the list. Similar comments apply to function applications, function definitions, and sequencing expressions.
It's tempting to have a single rule for exp opn exp where opn expands to one of the arithmetic or comparison operators. Resist the temptation. (It interferes with resolving precedence of operations.)
You will have to include precedence directives; not only for the arithmetic operations, but for keywords like do, then, else, the assignment operator := and so on. (For example, the parser needs to know when seeing while e do e + e whether this is a while loop added to an expression, or whether the loop body is e+e. This can be a little tricky; I accidentally made then and else left associative with the same precedence, and this caused the parser to group if e1 then e2 else e3 as (if e1 then e2) else e3.
My initial parser had the following "odd" behavior. I had a nonterminal for lvalues, with production rules "lvalue -> ID" and "lvalue -> lvalue LBRACK exp RBRACK" among others. Since I also had "exp -> lvalue" I expected an input like "x[0]" to correctly parse as an expression (hence as a program). But it didn't until I added the seemingly redundant production "lvalue -> ID LBRACK exp RBRACK"
It turns out that there was a relatively serious shift/reduce conflict involving the production "exp -> ID LBRACK exp RBRACK OF exp". Consider the input "x[3]+1". The right thing to do is to shift the ID, reduce it to an lvalue, shift LBRACK, shift INT and reduce it to an exp, shift RBRACK, and then reduce "lvalue LBRACK exp RBRACK" to an lvalue and then to an exp. Now consider the input "x[3] of 4". Here the right thing to do is to shift ID, shift LBRACK, shift INT and reduce it to an exp, shift RBRACK and OF, shift the second INT and reduce it to an exp, and finally reduce "ID LBRACK exp RBRACK OF exp" to an exp.
Note that in the first case we must reduce the identifier x to an lvalue and the second case we must not. This is a serious problem because we have to make the decision having seen only the identifier and the left bracket, before the two cases can be distinguished.
By default, in the case of a shift-reduce conflict the parser generator always chooses to shift. Half the time this will be correct. The other half you end up with "ID LBRACK exp RBRACK" on the stack with a lookahead other than OF, and the parser has no way to reduce this; it is not the literal right-hand-side of any production rule. This is why adding the new production "lvalue -> ID LBRACK exp RBRACK" is necessary: it lets the parser reduce the array access to an lvalue even though it did the wrong thing earlier by not reducing the identifier.
So in summary, you will probably need to add a rule like "lvalue -> ID LBRACK exp RBRACK" even though this looks redundant. (This is in fact the redundant rule mentioned by Appel.) (Of course if your parser is arranged very differently than mine, you might not.)
The tiger.grm.desc file contains lines like reduce by rule 28. This refers to the 29th production rule occuring in the input (the first rule is rule 0!). It is much easier to find rule 28 if you set up the grammar with one line per rule and no blank lines. Then just put the cursor on the first rule and go down 28 lines. (In emacs you can do this by ctrl-u 2 8 followed by the downarrow or ctrl-n.)
If you put %nodefault in your grammar file, the .desc file will explicitly list every decision for every lookahead token in every state. This doesn't actually give you any more information, but I found it slightly easier to think about.
Try compiling the grammar after every couple production rules you write, and fix as many conflicts as you can. Rules reported as "not reduced" means that you wrote the rule but that nonterminal is not used anywhere (or that the nonterminals that do use it are themselves not used, or ...) That is, there's no way to generate that nonterminal by starting from the start state program. This is probably not an error (unless you think you've finished writing the grammar) but know that these rules are being discarded rather than checked for conflicts.
Here's a sample entry from a tiger.grm.desc file:
```
error:  state 13: shift/reduce conflict (shift LBRACK, reduce by rule 18)

state 13:

	exp : ID . LPAREN exps RPAREN 
	exp : ID . LBRACE efields RBRACE 
	exp : ID . LBRACK exp RBRACK OF exp 
	lvalue : ID .  (reduce by rule 18)
	lvalue : ID . LBRACK exp RBRACK 

	EOF	reduce by rule 18
	COMMA	reduce by rule 18
	SEMICOLON	reduce by rule 18
	LPAREN	shift 38
	RPAREN	reduce by rule 18
	LBRACK	shift 37
	RBRACK	reduce by rule 18
        .       error
```
This reports that a certain state (#13) in the DFA, containing the given items, has a shift/reduce conflict. In this example, the conflict occurs when the parser has just seen an identifier [we know this because of where the dots are in the items] and the lookahead is a left bracket [we know this because of the error message]. The parser can either shift the bracket and keep going (hoping to get something like ID[exp] OF exp) or immediately reduce the identifier to an lvalue.
The lines following the items show what the parser has decided to do for this state. If the lookahed is EOF or COMMA or SEMICOLON then it will reduce the ID to an LVALUE (justified by rule 18). If the lookahead is LPAREN it will shift the LPAREN onto the stack and go to state 38, and so on. If the lookahead doesn't occur in this list, the parser will take the default action (in this case, report an error.) Note that in the problem case of LBRACK, the parser has decided to shift the bracket, which is probably what we want.

Chapter 2
Due January 31, 2001

Carefully read the programming assignment on pages 31-32 of Appel, and implement the required lexer. You will need copies of the files from $TIGER/chap2. The given tiger.lex nearly empty, and will need a lot of work to fill in. It is suggested (though not required) that you proceed in the following order:

Make sure you are familiar with Appendix A, the Tiger Language Reference Manual.
Extend the lexer to handle keywords, punctuation, and whitespace.
Extend the lexer to handle identifiers and integer constants. Remember that the string matched by the given regular expression is bound to the ML variable yytext.
Extend the lexer to handle comments. To handle nesting of comments you will have to extend the approach on page 31 to count nesting depth (using a global counter) and changing states (i.e., calling YYBEGIN) only when appropriate. (Remember that that all states except INITIAL must be declared in the second (Lex-Definitions) portion of the input. The format is a line starting with %s and then a space-separated list of states terminated with a semicolon.) Detect unclosed comments.
Extend the lexer to handle string literals. This is tricky because the lexer is required to return strings with all escape sequences have been expanded away.

One method is to maintain a global buffer. The lexer can extend this buffer each time it recognizes a piece of the string and can call itself recursively via continue() to get the remaining pieces, until it reaches the end of the string. At this point the entire string can be returned. This technique requires adding at least one more state to the lexer.

An alternate method is simply to match the entire string as a single token, and then write a separate translating function through which the lexer can pass the source text. This simplifies the lexer's organization, but requires you to implement your own code to detect escape sequences.

There are a number of sample Tiger files in $TIGER/samples to try once you believe your lexer is working, though you will likely need to create your own files for testing a partially-implemented lexer. Submit your code with the following command line: cs132submit tiger.lex

Chapter 1
Due January 24, 2001

Carefully read the programming assignment on pages 10-12 of Appel, and implement the required functions. This will involve extending a copy of the file $TIGER/chap1/slp.sml with the following definitions:

The function maxargs : stm -> int.
The type table.
The functions lookup : table * id -> int and update : table * id * int -> table.
The functions interpStm : stm * table -> table and interpExp : exp * table -> int * table
The function interp : stm -> unit.

Submit your code with the following command line: cs132submit slp.sml

Assignments

CS 132: Compiler DesignSpring 2001

Chapters 11 and 12

Chapter 9 Due April 4, 2001 [Firm]

Chapter 7 Due March 28, 2001

Chapter 6 Due February 21, 2001

Chapter 4 Due February 14, 2001

Chapter 3 Due February 7, 2001

Chapter 2 Due January 31, 2001

Chapter 1 Due January 24, 2001