Assignment 4

Harvey Mudd College
Computer Science 42
Assignment 4, Due Monday, Sept. 26, by 11:59pm

Unicalc and Unilang

This week we will complete the implementation of our Unit Calculation language: Unilang. This homework asks you to use Racket to implement the Unicalc language, using the Unicalc API that you completed in the previous 2 homeworks.

You've already completed the back-end API that performs all of the Unicalc calculations (with error propagation!). This week you will make the front-end easier to use by implementing a language based on the unit-conversion capability. Implementing the language will require you to have a function to tokenize the input (you have the option to use the one we've provided, if you wish), a function to parse the input, and a function to evaluate the parse trees. Examples and, optionally, starting points for the Parser and Evaluator are provided below.

The language lets you evaluate expressions, and to extend the database with new constants or units. Here is a sample interaction that you can use to check your results and to make sure you understand what you are trying to do with the evaluator. There are three things that are helpful to know when reading this. (1) When "~" appears between two numbers, it means "plus or minus", so the first portion of the interaction is telling us that 14 +/- 4 meters added to 9 +/- 3 meters results in 23 +/- 5 meters. (2) The operator "#" means "normalize"; (3) "def" is used to extend the database (although only during this run of the read-eval-print loop).

> (read-eval-print)


input > 14~4 m + 9~3 m
tokens: '(14 #\~ 4 "m" #\+ 9 #\~ 3 "m")
ast   : '(eval (add (uql 14 4 (m) ()) (uql 9 3 (m) ())))


result: 23.00000~5.00000 meter


input > 60 Hz * 30~1 s
tokens: '(60 "Hz" #\* 30 #\~ 1 "s")
ast   : '(eval (mul (uql 60 0 (Hz) ()) (uql 30 1 (s) ())))


result: 1800.00000~60.00000 Hz s


input > # 60 Hz * 30~1 s
tokens: '(#\# 60 "Hz" #\* 30 #\~ 1 "s")
ast   : '(eval (norm (mul (uql 60 0 (Hz) ()) (uql 30 1 (s) ()))))


result: 1800.00000~60.00000 


input > # 364.4 smoot
tokens: '(#\# 364.4 "smoot")
ast   : '(eval (norm (uql 364.4 0 (smoot) ())))



result: 364.40000 smoot


input > def smoot 67 in
tokens: '("def" "smoot" 67 "in")
ast   : '(def smoot (uql 67 0 (in) ()))


input > # 364.4 smoot
tokens: '(#\# 364.4 "smoot")
ast   : '(eval (norm (uql 364.4 0 (smoot) ())))


result: 620.13592 meter

Starter Files

tokenizer.rkt - A complete tokenizer for this assignment
parser.rkt - A starting point for the parser
evaluator.rkt - A starting point for the evaluator
parser-tests.rkt - Tests for the parser
evaluator-tests.rkt - Tests for the evaluator

If you want to use our A3 solution to do the UQL computations:

unicalc.rkt - Our solutions to Assignment 3: unicalc with error propagation
unicalc-db.rkt - Our solutions to Assignment 3: make-UQL

If you want to use your own unicalc code: That's great! You'll just need to make a few small changes:

At the end of unicalc-db.rkt, add the 6 lines

(provide extend-db)
;; Function to add new pairs to the database.
(define (extend-db new-unit new-QL)
   (set! unicalc-db (cons (list new-unit new-QL) unicalc-db))
   (set! UDB unicalc-db)
)

Near the start of unicalc.rkt (e.g., after the (requires ...) line), add the 2 lines:

;; Export (only) these functions to other modules
(provide add subtract multiply divide power normalize)

Why so much code provided? I'd prefer to write the application myself... .

Be our guest! With the support code we are balancing two valid concerns: on one hand, the deep understanding that a from-scratch implementation provides and, on the other hand, enough examples and framework to build familiarity with what are certainly intricate - and important! - computational ideas. Feel free to use as much - or as little - of the support code as you wish, e.g., writing your own tokenizer. If you implement from scratch is to use (or replicate) our output conventions, so that we can check your results efficiently!

Problem 1: Parsing the Unicalc language (INDIVIUAL OR PAIR). Submit in a file named parser.rkt

60 points

This problem asks you to implement a Parser for the Unicalc language, whose grammar is as follows:

T -> def V L | L
L -> # E | E
E -> P + E | P - E | P
P -> K * P | K / P | K
K -> U ^ I | U
U -> ( L ) | - U | Q
Q -> D ~ D V* | D V* | V V* 
       (where V* means 0 or more V's)


I -> (any nonnegative integer) | - (any nonnegative integer)
D -> (any nonnegative number)
V -> (any variable name, i.e., any string token)

The tokenizer

The provided tokenizer.rkt is complete as given, although you are free to ignore it and write your own tokenizer. The function tokenize takes a string, and returns a list of tokens. The tokens are of 3 forms:

Numbers. These are ordinary non-negative racket integer or floating-point numbers. They will return #t if you test them with (number? ...), and may or may not return true if you test them with (integer? ...) (depending on whether they are integer or floating-point).
Single characters. The symbolic operators in the unicalc language are #, ~, +, -, /, *, ^ (, and ). Each is returned as a single racket character (i.e., #\# or #\~ or #\+ or ...).
Strings. Any other word in the input (def, mile, s, pound, ...) becomes a string token ("def", "mile", "s", ...).

So, for example,

(tokenize "def x -6 mile/hour * 7.0 minute")
  =====>   '("def" "x" #\- 6 "mile" #\/ "hour" #\* 7.0 "minute")

The parse function

Your main task is to implement a function named (parse input-string) that takes a string, and returns a list of the following form:

( parse-result residual )

The first item in the list, the parse-result is the abstract syntax tree for the input (see below for examples).

The second item in the list, the residual, is the rest of the tokens in the input list that are not included in the parse tree (or an empty list, if all tokens were accounted for).

If the input cannot be parsed, the function should call the error function.

Of course you will implement many helper functions along the way. It is important to keep your functions short, focused and well-commented!

Here are some examples of abstract syntax trees - keep in mind that they can be composed, as well. All non-numeric items in the parse tree should be symbols.

USER INPUT                           ABSTRACT SYNTAX TREE
----------                           -----------------------------
21 * 2 meter                         (eval (mul (uql 21 0 () ()) (uql 2 0 (meter) ())))

def x  42 ~ 2 meter                  (def x (uql 42 2 (meter) ()))

# x                                  (eval (norm (uql 1 0 (x) ())))

def z  y + g - x                     (def z (add y (sub g x)))  
                                         // the grammar is right associative, unfortunately

def x 3.42 ~ .75 foot foot / second  (def x (div (uql 3.42 0.75 (foot foot) ()) (uql 1 0 (second) ())))
                                         // This is interpreted as 3.42 +/- .75 ft ft
                                         //   *divided by* (implicitly 1) second.

def z y+(x^2-(z/y))                  (def z (add (uql 1 0 (y) ()) 
                                                 (sub (pow (uql 1 0 (x) ()) 2) 
                                                      (div (uql 1 0 (z) ()) 
                                                           (uql 1 0 (y) ()))))) 

def z y+x^2-z/y                      (def z (add (uql 1 0 (y) ()) 
                                                 (sub (pow (uql 1 0 (x) ()) 2) 
                                                      (div (uql 1 0 (z) ()) 
                                                           (uql 1 0 (y) ())))))

In the copy of the grammar below, each rule is annotated with a comment that explains how to build an abstract syntax tree out of the abstract syntax trees for the pieces. In several cases there's nothing to do, i.e., if an E is simply a P, then the abstract syntax for the E is exactly the abstract syntax returned by the recursive call to parse-p.

T -> def V L        ;; (def V L) 
   | L              ;; (eval L)


L -> # E            ;; (norm E)
   | E              ;; E


E -> P + E          ;; (add P E) 
   | P - E          ;; (sub P E)
   | P              ;; P


P -> K * P          ;; (mul K P)
   | K / P          ;; (div K P)
   | K              ;; K


K -> U ^ I          ;; (pow U I) 
   | U              ;; U


U -> ( L )          ;; L
   | - U            ;; (neg U)
   | Q              ;; Q


Q -> D V*           ;; (uql D 0 V* ())
   | D ~ D V*       ;; (uql D1 D2 V* ())
   | V V*           ;; (uql 1 0 (cons V V*) ())


I -> (any nonnegative integer)   ;; the integer (not a list) 
   | - (any nonnegative integer) ;; the negative integer (not a list)


D -> (any nonnegative number)    ;; the number (not a list)


V -> (any variable name, i.e., any string token) ;; a symbol

Other methods for Parser Recursive-descent parsing uses a method for each nonterminal in the grammar. One way to build up to the full grammar is to implement one nonterminal method at a time, for example, from the bottom towards the top. So, you would implement and test parse-q, parse-u, and then parse-k, and so on. If you use the skeleton code, you will need to modify parse-t, finish writing parse-q, and implement the remaining parsing functions.

Testing your code While coding, you will want to call the individual parse-X functions with hand-created lists of tokens, e.g.,

(parse-q '("volt" "second"))
(parse-q '(1 "mile"))
(parse-q '(7.0 #\~ 0.5 "kw" "hour"))

In the end, you can test your parser in the usual way, by calling test with the parse function (which in turn calls parse-input). We provide a number of tests in the file parser-tests.rkt. You should add more tests of your own.

Part 3: Evaluating the Unicalc language (INDIVIDUAL ONLY) Submit in a file named evaluator.rkt

40 points

This part completes the unicalc language. Here, you should build a interpret-unicalc function (or extend the one that's already in the starter file). interpret-unicalc should take as input a valid abstract syntax tree and either update the database or display the value of the AST as appropriate.

Again, it's up to you whether you want to use the starter file, but you need to support the following functions:

(read-parse-print) - reads the user's input and prints the results of parsing the input and the final result of evaluating the input to the screen.
(interpret-unicalc ast) - do whatever is implied by the parse tree, which should have eval or def at the root.
(eval-expr ast) - returns the UQL that is the value of a given quantity expression.

The starter file contains complete code for the first two of these functions, but has a very incomplete eval-expr.

What does everything evaluate to? Well, every expression that can be produced by the L nonterminal in the grammar evaluates to a UQL. This is described below. However, first a quick note about variables and environments. For this assignment we will work with only one environment to store the values of all of our variables. This environment will be global and accessible from everywhere, hence we will not need to pass it as an argument to our interpret-unicalc unical-eval function. In other words, all variables are global variables, and our functions will not take any parameters.

Evaluating T's: Top-level statements (T's) have no value, but rather do something.

If the AST is of the form (def varname L), then it has the side effect of adding the symbol varname to the unicalc-db; the associated UQL value is the value of L. The unicalc-db.rkt file now contains a function extend-db that takes two arguments: a symbol, and a UQL, and sticks these onto the front of the unicalc-db used by add, subtract, normalize, ....
If the AST is of the form (eval L), then evaluate the given expression to a UQL and display the result.

Evaluating L's (and below): Most of these have a value that is a UQL (with obvious exceptions of I, D, and V)

If the AST is of the form (norm X), then evaluate X to an UQL, normalize it, and return the result
All other parse trees (add X Y), (neg X), (pow X n), ... are even more straightforward. They simply evaluate to the result of applying the relevant arithmetic functions from unicalc.rkt to the UQL(s) that are the values of the subexpressions.

Testing your code

We have again provided some tests for your evaluator, but you should add more tests.

Optional extra credit: language extensions! [INDIVIDUAL] Submit in a file named unilangExtra.zip (Put all necessary code, including support files, into this file.)

For up to an additional +25 points, provide any extension you wish to the Unicalc language. Small extensions will be worth small amounts of points (e.g., new kinds of tokens and mathematical operations, or left-associativity), but large and difficult extensions, such as implementing "real" functions, together with parameters and local scope, will be worth large amounts of points. How to design and implement this is 100% up to you.

Include in your zip file a file readme.txt that describes what you did, how you changed your Parser and Evaluator, and how we can test it to see it working. Give the graders a couple of examples to try!