CS 132 Assignments

The directory $TIGER is /cs/cs132/src/tiger on turing.

Chapters 11 and 12
Due April 11, 2003

Get copies of all the files in $TIGER/chap11; these contain a complete implementation of the code for Chapter 10, which creates an interference graph for a sequence of assembly instructions.

Implement the Color and RegAlloc structures, as shown on Page 253 in Appel. Your code need not handle coalescing or spilling. Because of this, it may be easier to implement optimistic coloring from scratch --- based on the presentation given in class --- rather than trying to simplify the pseudocode in Chapter 11.

When you implement the simplify phase (picking an order to remove nodes from the interference graph) you should not actually modify the graph. (You'll need the original graph back for the actual coloring phase.) One possibility --- though certainly not the only one --- is to keep a list of the nodes not yet removed from the graph, and a mapping telling for each node in the graph how many of its neighbors have not yet been removed. There is an implementation of such lookup graphs in the structure Graph.Table.

Appel's suggestion that we add an intruction that uses all the special registers and the callee-save registers doesn't work in this case, for a couple reasons. It is therefore probably easiest to create two lists: SparcFrame.registers which contains all of the Sparc registers (which can be used as the values to map Temporaries 0..31 to hardware registers) and another list, something like SparcFrame.useableRegisters which contains the subset of Sparc registers to which you would be willing to allocate temporaries to (e.g., not the stack pointer or the zero register)

All that's left to do after the register allocator is working is cleanup:

Finish the procedure entry/exit routines as specified on pages 208 and 260--262. For procEntryExit3 you may assume that every function needs space in its stack frame for 6 outgoing arguments (i.e., that no function passes more than 6 arguments to its callees). This function should probably add the save and ret/restore instructions, and an initial .global (function-label) assembler pseudo-op to make sure the function's label is visible to other files.

Thus, the save instruction in the prolog must reserve 92 bytes + (number of escaping_locals * 4 bytes), making sure to keep the stack 8-byte aligned and also remembering that the argument to save should be negative since the stack grows downwards. (92 = 6 outgoing args (minimum permitted) * 4 bytes + 64 bytes for register-window spilling + 4 bytes for (unused) pointer for functions returning structures.)

The Sparc register windows take care of callee-save registers for us automatically, so there's no point in starting the routine by moving the callee-save registers into temporaries, and finishing by moving them back.

Next, the Frame.stringneeds to emit assembly code for string data. The output could look something like

  .section ".data"
(the-label):
  .word (the-length)
  .ascii (the-string-in-quotes)
  .align 4
  .section ".text"

[Actually, since strings are read-only, there's little point in putting them in the data segment, rather than leaving them in the read-only text segment with the code. But, the above a perfectly good example of how to allocate in the data segment.]

For testing purposes, the function Translate.getResult should probably reset the list of fragments back to nil; otherwise fragments from one test can appear in the output of future tests.

More hints on getting code running (some of which I probably should have mentioned for the previous assignment; sorry):

There are several issues with runtime.c that may require modifications:
- The name getchar may conflict with the standard function getchar; the simplest fix is to rename the function to something like getchr and change env.sml appropriately.
- Appel's code assumes that strings are represented as a *4-byte* integer containing the length, followed immediately by that many characters. You should either ensure that your compiler's string representation matches this, or modify runtime.c appropriately.
- Functions in the runtime listed in Env.base_venv are called using the standard calling convention rather than the external calling convention. Thus, these functions all need to be given an extra (dummy) initial static link argument (as in the inttostring example below).
When generating the assembly code for strings, it is recommended that they be followed with an appropriate .align 4 directive so that the next thing emitted falls on a word boundary.
It's easy to generate code for a function call that looks like the following:
```
           compute argument 0 into temporary t0
           move t0 into %o0
           compute argument 1 into temporary t1
           move t1 into %o1
           ...etc...
           call function.
```
And this is fine unless, for example, while computing argument 1 the code writes to %o0. Now the only reason this should occur is if the computation of argument 1 required a function call, and the code supplied by Appel explicitly contains a phase that gets rid of such nested function calls, so why is there a problem? The answer is that multiplication and division are implemented on the SPARC by procedure call, so if argument 1 involves a product or quotient the above code isn't going to work. The canonicalizing code doesn't know that these arithmetic operations are really function calls, because on other architectures they won't be. We could modify canonicalize to be no-longer machine independent, but probably a better solution is to generate the following code instead:
```
           compute argument 0 into temporary t0
           compute argument 1 into temporary t1
           ...etc...
           move t0 into %o0
           move t1 into %o1
           ...etc...
           call function.
```
This applies not only to the arguments of normal procedure calls, but also to the operands of multiplication and division.
Adding a function to your runtime.c can make debugging easier. By adding "inttostring" to the initial environment in env.sml, you can then write test programs that print out integers without requiring that your write the integer-to-string conversion in Tiger. (The first parameter is an unused static link.)
```
struct string *inttostring(void* sl, int a)
{
  struct string *t = (struct string *)malloc(sizeof(int)+21);
  sprintf(t->chars, "%d", a);
  t->length = strlen(t->chars);
  return t;
}
```
Compile and link everything together with a C compiler (gcc), not a C++ compiler (g++).

Submit your code as chapter 12.

Chapter 9
Due April 2, 2001

Read Chapter 9, and get copies of canon.sml from $TIGER/chap8 and the files in $TIGER/chap9. This assignment has two parts: implementing the function SparcGen.codegen (see the signature CODEGEN is given on page 207 of Appel), and extending the Frame signature and definition as specified on pages 208-209.

The function SparcFrame.string : Tree.label * string -> string is described on page 262 of Appel (see also the figure on page 260). It emits the SPARC assembly language to emit the given label and the given string in the SparcFrame.string. For this assignment, write a function that returns any string you want. (This will be fixed in a later assignment.)

The code in $TIGER/chap9/main.sml also assumes that your semant.sml has a function transProg which creates a new level for the main routine (with label "tigmain"), invokes transExp, passes it to Translate.procEntryExit to make a fragment for the main program, and finally calls Translate.getResult to return all the fragments.

Be careful with MOVE. Normally a T.TEMP or a T.MEM appearing in an expression stands for the value in the specified location. However, when it appears on the left-hand-side of a MOVE it is the location itself that is being referenced. So the right way to generate code for MOVE(e1,e2) is not to generate code that starts by computing the values of e1 and e2. The dst and src parts of an Assem.instr are exactly the def and use sets seen in class.

Chapter 7 (!)
Due March 26, 2003 (3 days after Spring Break)

First, make sure you have read Chapters 5, 6, and 7 in Appel. Combine the files from $TIGER/chap7 with a copy of your code. Of particular interest are the provided sparcframe.sml, frame.sig and the skeleton signature and implementation for Translate.

You may change any of the files however you like (e.g., rewrite code, change the definitions of types, etc.), but anything machine-specific should go in the sparcframe.sml file, and anything related to generic generation of intermediate-code should go in the translate.sml file.

Implement the remaining parts of the PROGRAM section of chapter 6:
- Extend semant.sml to keep track of the current nesting level as it goes along. This will mean that functions like transExp and transDec will need to take a new argument of type Translate.level, and you will need to create new sub-levels via Translate.newLevel for each function definition.
  
  For each function, there is supposed to be a new level. This level must be associated with the function in the environment, and be also be passed in when translating the body of the function and when calling procEntryExit. This can be slightly awkward if your code to extend the environment is completely separate from the code that compiles the function bodies. One solution: Create the new levels when extending the environment; look up the level in the environment when doing the actual translation and pass the function's level to transExp and procEntryExit.
Implement the remaining parts of the PROGRAM section of chapter 7: Extend the code in semant.sml to return not only the type of the code, but the corresponding intermediate representation. Follow Appel's organization: add lots of helper functions to the Translate module and its signature that actually do the translations; each one should generating the intermediate code for a particular language construct given the intermediate code for all of the sub-expressions. A big part of the translation is simply figuring out how to get the right static link for an arbitrary variable. Remember that the static link is always the first formal parameter in a frame. (You can use SparcFrame.formals to get the access information for formal parameters including the static link, which will be at the head of the list.) A common bug with static links is to put the code to take one step in the chain after the code the follow the remaining links. The simplest solution is to write a frame-finding function with three arguments: (1) the frame you're currently at; (2) the frame you're looking for; (3) the code that computes a pointer to the frame given as the first argument. Recall that function definitions return no code; they are passed to Translate.procEntryExit and stored as a "fragment". Fix the parse.sml file if necessary if your changes make it stop typechecking. Fix any bugs that you come across in the SparcFrame or Translate modules, or anywhere else.



Submit any files to which you have made changes.

  
Chapter 5 +

Due March 5, 2003
Implement the module Semant, as specified on pages 121-122 on Chapter
  5. Be sure
  to read the chapter and familiarize yourself with $TIGER/chap5/types.sml before
  starting;
the Tiger type system is a bit different.
The only thing one can do with an SML value of type unit ref (other
  than assign it the unit value --- which it already contains --- or to access
  this unit value) is to compare it with another unit ref for equality. Thus,
  one
  can (and Appel's
  supplied code does) use unit refs as an easy way to get "unique
  stamps" to distinguish between structurally identical types. One could just
  as well use a global counter to genereate unique type ids; the unit ref hack
  is arguably simpler.

PLUS: 
Implement the module FindEscape as discussed on pages
138--139 of the text.  Your module must implement the function
findEscape: Absyn.exp -> unit, but  the names
and types of any helper functions you define need not exactly
match Appel's suggestions.


This should all be submitted as assignment 5.


Chapter 4

Due February 19, 2003
Adapt your parser to return abstract syntax trees, as specified on pages 100-101
in Chapter 4. You will need the files in $TIGER/chap4. The text
of the chapter contains some example abstract syntax that might be informative.
 Remember to group together adjacent type or function definitions into a single
  abstract syntax declaration.
You'll also need to change the %nonterm line to declare that
  your nonterminals now have interesting values associated with them rather than
  values of type unit.
  For example, if you have a nonterminal exp then you'll need to
  say exp of Absyn.exp
.
Chapter 3

  Due February 12, 2003
For reference you might want to look at the ML-Yacc documentation. 
However, it does not appear particularly helpful, so you should look closely at 
the examples in the book. (Be aware, however, the the book examples tend to be 
incomplete; every parser requires a %name, for example.) 
 

  Unlike ML-Lex, the ML-Yacc program doesn't expect a semicolon for each production 
    rule in the grammar. 
     
  
The lexer provided for you in $TIGER/chap3/tiger.lex.sml does 
    not appear to understand comments: it sees /* as DIVIDE 
    TIMES. So either delete your copy of this tiger.lex.sml 
    file and plug in your own lexer (modified as on page 82 of the text!) or else 
    remove all comments from your test inputs. 
     
  
There are a number of small pieces of tiger code in the directory $TIGER/testcases, 
    but you will almost certainly have to devise your own small tests. If a test 
    fails to parse, it can be useful to simplify it as much as possible while 
    retaining the error; this is particularly easy given that the resulting code 
    need not typecheck or make sense, so long as it is grammatically correct. 
    (For example, any expression can be replaced by 3, and so on.) 
     
  
Try compiling the grammar after every couple production rules you write, 
    and fix as many conflicts as you can. Rules reported as "not reduced" means 
    that you wrote the rule but that nonterminal is not used anywhere (or that 
    the nonterminals that do use it are themselves not used, or ...) That is, 
    there's no way to generate that nonterminal by starting from the start state 
    program. This is probably not an error (unless you think you've 
    finished writing the grammar) but know that these rules are being discarded 
    rather than checked for conflicts. 
     
  
The description of the Tiger language distinguishes between type identifiers 
    and ordinary identifiers, requires variable definitions to have type annotations 
    when they are defined to be nil, requires break 
    to appear only within loops, requires that loop bodies not have a value, etc. 
    These semantic constraints will all be checked later and should not affect 
    your grammar. 
     
  
The types int and string are just predefined identifiers 
    and will not appear in the grammar. 
     
  
Be careful that your grammar for record types only allows commas to appear 
    between field specifications, and that no comma is required if there 
    is one or fewer specifications in the list. Similar comments apply to function 
    applications, function definitions, and sequencing expressions. 
     
  
It's tempting to have a single rule for exp opn exp where opn 
    expands to one of the arithmetic or comparison operators. Resist the temptation. 
    (It interferes with resolving precedence of operations.) 
     
  
You will have to include precedence directives; not only for the arithmetic 
    operations, but for keywords like do, then, else, 
    the assignment operator := and so on. (For example, the parser 
    needs to know when seeing while e do e + e whether this is a 
    while loop added to an expression, or whether the loop body is 
    e+e. This can be a little tricky; I accidentally made then 
    and else left associative with the same precedence, and this 
    caused the parser to group if e1 then e2 else e3 as (if 
    e1 then e2) else e3 which gets rejected. 
     
  
The tiger.grm.desc file contains lines like reduce by 
    rule 28. This refers to the 29th production rule occuring in the input 
    (the first rule is rule 0!). It is much easier to find rule 28 
    if you set up the grammar with one line per rule and no blank lines. Then 
    just put the cursor on the first rule and go down 28 lines. (In emacs you 
    can do this by ctrl-u 2 8 followed by the downarrow or ctrl-n.) 
     
  
If you put %nodefault in your grammar file, the .desc 
    file will explicitly list every decision for every lookahead token in every 
    state. This doesn't actually give you any more information, but I found it 
    slightly easier to think about. 
     
  
My initial parser had the following "odd" behavior. I had a nonterminal 
    for lvalues, with production rules "lvalue -> ID" and "lvalue 
    -> lvalue LBRACK exp RBRACK" among others. Since I also had "exp 
    -> lvalue" I expected an input like "x[0]" to correctly 
    parse as an expression (hence as a program). But it didn't until I added the 
    seemingly redundant production "lvalue -> ID LBRACK exp RBRACK" 
    It turns out that there was a relatively serious shift/reduce conflict 
      involving the production "exp -> ID LBRACK exp RBRACK OF exp". 
      Consider the input "x[3]+1". The right thing to do is to shift 
      the ID, reduce it to an lvalue, shift LBRACK, 
      shift INT and reduce it to an exp, shift RBRACK, 
      and then reduce "lvalue LBRACK exp RBRACK" to an lvalue 
      and then to an exp. Now consider the input "x[3] of 4". Here the right thing 
      to do is to shift ID, shift LBRACK, shift INT 
      and reduce it to an exp, shift RBRACK and OF, 
      shift the second INT and reduce it to an exp, 
      and finally reduce "ID LBRACK exp RBRACK OF exp" to an exp. 
    
Note that in the first case we must reduce the identifier x 
      to an lvalue and the second case we must not. This is a serious 
      problem because we have to make the decision having seen only the identifier 
      and the left bracket, before the two cases can be distinguished. 
    
By default, in the case of a shift-reduce conflict the parser generator 
      always chooses to shift. Half the time this will be correct. The other half 
      you end up with "ID LBRACK exp RBRACK" on the stack with a 
      lookahead other than OF, and the parser has no way to reduce 
      this; it is not the literal right-hand-side of any production rule. This 
      is why adding the new production "lvalue -> ID LBRACK exp RBRACK" 
      is necessary: it lets the parser reduce the array access to an lvalue even 
      though it made the "wrong" choice earlier by not reducing the 
      identifier. 
    
So in summary, you will probably need to add a rule like "lvalue 
      -> ID LBRACK exp RBRACK" even though this looks redundant. This is 
      in fact the redundant rule mentioned by Appel. (Of course if your parser 
      is arranged very differently than mine, you might not need this rule.)
    
 
  
Here's a sample entry from a tiger.grm.desc file: 
    error:  state 13: shift/reduce conflict (shift LBRACK, reduce by rule 18)

state 13:

	exp : ID . LPAREN exps RPAREN 
	exp : ID . LBRACE efields RBRACE 
	exp : ID . LBRACK exp RBRACK OF exp 
	lvalue : ID .  (reduce by rule 18)
	lvalue : ID . LBRACK exp RBRACK 

	EOF	reduce by rule 18
	COMMA	reduce by rule 18
	SEMICOLON	reduce by rule 18
	LPAREN	shift 38
	RPAREN	reduce by rule 18
	LBRACK	shift 37
	RBRACK	reduce by rule 18
        .       error

    This reports that a certain state (#13) in the DFA, containing the given items, 
    has a shift/reduce conflict. In this example, the conflict occurs when the 
    parser has just seen an identifier [we know this because of where the dots 
    are in the items] and the lookahead is a left bracket [we know this because 
    of the error message]. The parser can either shift the bracket and keep going 
    (hoping to get something like ID[exp] OF exp) or immediately 
    reduce using rule 18, lvalue : ID. 
     The lines following the items show what the parser has decided to do for 
      this state. If the lookahed is EOF or COMMA or 
      SEMICOLON then it will reduce the ID to an LVALUE 
      (justified by rule 18). If the lookahead is LPAREN it will 
      shift the LPAREN onto the stack and go to state 38, and so 
      on. If the lookahead doesn't occur in this list, the parser will take the 
      default action (in this case, report an error.) Note that in the problem 
      case of LBRACK, the parser has decided to shift the bracket, 
      which as discussed above is probably what I wanted. 

Chapter 2

  Due February 5, 2003
Carefully read the programming assignment on pages 31-32 of Appel, and implement 
the required lexer. You will need copies of the files from $TIGER/chap2. 
The given tiger.lex nearly empty, and will need a lot of work to 
fill in. One possible order would be: 

  Make sure you are familiar with Appendix A, the Tiger Language Reference 
    Manual.
  Extend the lexer to handle keywords, punctuation, and whitespace. (Warning: 
    ML-Lex does not understand the escape sequence "\f" for form-feed, 
    but you can use the escape sequence "\012" instead.)
  Extend the lexer to handle identifiers and integer constants. Remember that 
    the string matched by the given regular expression is bound to the ML variable 
    yytext.
  Extend the lexer to handle comments. To handle nesting of comments you will 
    have to extend the approach on page 31 to count nesting depth (using a global 
    counter) and changing states (i.e., calling YYBEGIN) only when appropriate. 
    (Remember that that all states except INITIAL must be declared 
    in the second (Lex-Definitions) portion of the input. The format is a line 
    starting with %s and then a space-separated list of states terminated 
    with a semicolon.) Detect unclosed comments.
   
    Extend the lexer to handle string literals. This is tricky because the 
      lexer is required to return strings with all escape sequences have been 
      expanded away.
    One method is to maintain a global buffer. The lexer can extend this buffer 
      each time it recognizes a piece of the string and can call itself recursively 
      via continue() to get the remaining pieces, until it reaches 
      the end of the string. At this point the entire string can be returned. 
      This technique requires adding at least one more state to the lexer.
    An alternate method is simply to match the entire string as a single token, 
      and then write a separate translating function through which the lexer can 
      pass the source text. This simplifies the lexer's organization, but requires 
      you to implement your own code to detect escape sequences.
  

There are a number of sample Tiger files in $TIGER/testcases to try 
once you believe your lexer is working, though you will likely need to create 
your own files for testing a partially-implemented lexer. Submit your code with 
the following command line: cs132submit tiger.lex 
Chapter 1

  Due January 29, 2003
Carefully read the programming assignment on pages 10-12 of Appel, and implement 
the required functions. This will involve extending a copy of the file $TIGER/chap1/slp.sml 
with the following definitions: 

  The function maxargs : stm -> int.
  The type table.
  The functions lookup : table * id -> int and update : 
    table * id * int -> table.
  The functions interpStm : stm * table -> table and interpExp 
    : exp * table -> int * table
  . 
  The function interp : stm -> unit.

Submit your code with the following command line: cs132submit slp.sml