Project 07

An Interpreter for a Dynamically-Scoped Mini-Lisp

Due by 5:00 PM on Friday April 14, 2000

 

Purpose:

 

In this phase we transform the calculator language into a mini Lisp-like language by adding structured data in the form of s-expressions.

Background:

 

S-expres- sions As Data

Before I can specify the details of the project I must first tell you about how s-expressions are used as data in Lisp. So far we have only simple data types: numbers and strings. In Lisp s-expressions are provided as the principal way of building up compound data structures out of the base types which include not only numbers and strings but also symbols, which so far we have used just for variable names. Lists are constructed using the cons operator (which just stands for construct). For instance, we could build a list of 3 integers with the expression: (cons 2 (cons 6 (cons -3 ()))) which evaluates to the value (2 6 -3). Remember that in Lisp the second item in a cons does not have to be a list itself, though in this case it was for each of the calls to cons.

In general, a cons pair is written by writing the two items of the pair inside parentheses, separated by a ".". Thus the result of evaluating (cons 2 3) is (2 . 3). However, in order to cut the nesting of parentheses, the representation is simplified according to the rule that whenever a dot immediately precedes a left parenthesis, the dot, the left parenthesis, and its matching right parenthesis, are not shown. That is why the result of (cons 2 (cons 6 (cons -3 ()))) is printed as (2 6 -3) rather than (2 . (6 . (-3 . ())))

The empty list can be written either as (), or as nil. You do not have to worry about which form the user types: the parser is already set to return Atom Nil for either representation.

CAR and CDR

To get the first thing in a cons pair the function car is used, while the function cdr is used to return the second thing in a pair (usually the tail of a list). These functions take their names from the first implementation of Lisp on an early IBM mainframe. A pair was always represented as a pair of pointers that were loaded respectively into the address and decrement registers of the cpu. Thus car is ``Contents of Address Register'' and cdr is ``Contents of Decrement Register''.

We use these functions as follows:

> (define a (cons 2 (cons 6 (cons -3 ()))))
a <- (2 6 -3)
> (car a)
2
> (cdr a)
(6 ~3)
> (car (cdr a))
6
         

The operations car and cdr apply only to cons pairs. It is an error to apply either to nil or to a symbol, string, or number (that is, to atoms).

Quote

Now, obviously, it would be nice if we could just type (2 6 -3) instead of typing (cons 2 (cons 6 (cons -3 ()))), but there is a problem:

> (define a (2 6 -3))
>>Error: Unknown operator 2 in (FUNCTION 2)
         

Since we are also using the s-expression syntax for our program syntax, the interpreter sees a list of things in parentheses as a request to treat the first item as a function and send the rest of the items to it as arguments. Of course, 2 is not a function. (This last error was from a real Lisp interpreter, the sample solution just prints "Illegal Expression".)

The way around this is the quote operator which simply says "Don't evaluate my argument!":

> (define a (quote (2 6 -3)))
a <- (2 6 -3)
> (+ 2 3)
5
>(quote (+ 2 3))
(+ 2 3)
>(car (+ 2 3))
Runtime Error -- Illegal argument to car.
>(car (quote (+ 2 3)))
+
         

For any expression exp, the expression (quote exp) can also be abbreviated 'exp, as in '(2 6 -3). However, you do not have to worry about this, as it is handled by the parser. You will always see it as though it had been typed (quote exp).

Notice quote is not idempotent (that is, applying it more than once is not the same as applying it once):

> '(2 6 -3)
(2 6 -3)
> ' '(2 6 -3)
(quote (2 6 -3))
         

Symbols as Data

Now, with quote, it becomes meaningful to have symbols as data. Suppose a and b are defined to be 3 and 4 respectively. Well, there is a difference between the variable a and the symbol a. For instance:

> (define a 3)
a <- 3
> (define b 4)
b <- 4
> a
3
> 'a
a
> (+ a b)
7
> (+ 'a b)
Runtime Error -- Illegal operands to math function +
> (car '(a b))
a
> (+ (car '(a b)) b)
Runtime Error -- Illegal operands to math function +
         

So, once they've been quoted, symbols are just symbols. They become pretty much like the elements of an enumerated type.

Note, this is not true of numbers or () which are called self-quoting. (That is, you didn't know it, but they were really quoted all along, you just didn't see the quote!):

> 3
3
> '3
3
> (+ 3 4)
7
> (+ '3 4)
7
>'(3 4)
(3 4)
> (car '(3 4))
3
> (+ (car '(3 4)) 4)
7      

Note that explicite use of quote is still not idempotent on the self-quoting items:

> '3
3
> ''3
(quote 3)   

That's because your not quoting the three (vaculously) and then quoting it again. Rather, you are quoting the act of quoting the three.

As a more complete example, if we wish to search a list for a value, we can use the following code:

(define (memb val list)
   (cond ((null list) ())
         ((= val (car list)) 'T)
         (memb val (cdr list))))
         

Here null is a predicate that tests whether its argument is the empty list.

Boolean Values

The standard treatment of boolean values in Lisp is somewhat analagous to that used in C and C++. The empty list (nil or ()) is used to represent false, and any non-nil value (including any other atom, or any other compound s-expression) is true. (Thus, 0 is now true, since it is not nil.)

Remember that when (quote exp) is evaluated, it's value is exp. So the value returned in the second case is the symbol T which is the result of evaluating the expression (quote T). That is, if we look at the ML value returned, it is just (Atom (Symbol "T")), not ((Atom (Symbol quote)) $ (Atom (Symbol "T"))). If we had left off the quote, then it would have appeared we were trying to return the value of some variable named T. Note that there is nothing special about 'T. It is just an arbitrary non-nil value that has become a standard value to use for "true".

Richer Data Structures

Because s-expressions don't have to be homogeneous like ML lists, they can be used to build arbitrary data structures. For instance, we can represent a binary search tree as a three element list, the first element of which is the root, the second of which is the left sub-tree, and the third of which is the right sub-tree. As in:

'(4 (2 (1 () ()) (3 () ())) (8 (6 (5 () ()) (7 () ())) (9 () ())))
         

which can be seen as representing the binary search tree:

To make this clearer, imagine it formatted as:

'(4 
  (2 
   (1 () ()) 
   (3 () ())) 
  (8 
   (6 
    (5 () ()) 
    (7 () ())) 
   (9 () ())))
         

Using this representation, the function to determine if a tree contains a given value would be:

(defun (tree_memb val tree)
   (cond ((null tree) ())
         ((= val (car tree)) 'T)
         ((< val (car tree)) (tree_memb val (car (cdr tree))))
         ('T                 (tree_memb val (car (cdr (cdr tree)))))))
         

Of course, it would be better to impose some level of data abstaction by first defining some methods and using them, as in:

(defun (empty tree) (null tree))
(defun (root tree) (car tree))
(defun (left-subtree tree) (car (cdr tree)))
(defun (right-subtree tree) (car (cdr (cdr tree))))
         
(defun (tree_memb val tree)
   (cond ((empty tree) ())
         ((= val (root tree)) 'T)
         ((< val (root tree)) (tree_memb val (left-subtree tree)))
         ('T                  (tree_memb val (right-subtree tree)))))

 

What I Have Provided:

 

In order to support the use of s-expressions as data, the type AbstractSyntax.datavalue must be changed from ConcreteSyntax.literal to ConcreteSyntax.sexp.

The code for SyntaxConverterFun must be modified to deal with the use of quote. Any time you see a quoted expression, the body of the expression should be wrapped with a Cdata rather than being further evaluated. The only other changes to be made to SyntaxConverterFun are those necessary to recognize and check the arity of the new built-in operators described below.

As stated above, the parser is already set up so that if the user types 'exp you will recieve an sexp as though they had typed (quote exp). You do not have to deal with that conversion yourself.

I have created a version of the SML-NJ interpreter with the parser structure SexpParser, the two syntax structures ConcreteSyntax and AbstractSyntax, and the SyntaxConverter structure, as well as the UserInterface functor, and the signature for the evaluator functors already loaded. The executable (or a link to it) is in /cs/cs131/bin/sml-sexp. The file /cs/cs131/bin/sml-cs131 (which is the version loaded by emacs) has also been re-linked to this version.

You may find it useful to diff the new version of syntaxconverter.sml against the one from the last project to get a sense of the changes.

(I have also loaded a sample solution to the evaluator functor and built a structure called testEvaluator as well. You can use the functions in this structure from the SML command line to see what the results of evaluation should be.)

I have also provided an executable sample solution at /cs/cs131/bin/sml-soln-proj7. You can use this to test how your solution should behave (from the standpoint of user interaction). This launches directly into my solution's interpreter.

Finally, the file /cs/cs131/proj_7/tests.lisp contains a variety of test functions that your solution should be able to handle.

What You Need To Do:

 

  1. You should define the built-in functions cons, car, and cdr. You should also define the built-in functions null which tests its argument and returns 'T if it is nil and nil otherwise, and atom which returns nil if its argument is a cons-pair and 'T otherwise.

    Note that while car and cdr are not defined for atoms, atom, and null are defined for all value types. This is exactly so that you can use atom to check whether or not it is safe to apply car or cdr.

  2. The built-in functions write and makestring must be extended to deal with arbitrary s-expressions.

    Note, however, that the provided function sexpToString can be used to do most of the work.

  3. The relational operators should be redefined to return nil for false and 'T for true. The evaluation of conditional expressions should also be changed to use these values (i.e. nil for false, non-nil for true) rather than zero and non-zero.

    Equality (=) and inequality (<>) defined for and among all the types of the language (numbers, strings, s-expressions). Any cross-type comparison should return nil. (Note that comparing a real and an integer is not cross-type, since both are subsumed in the general notion of number in this language.) Recall that the implementation of s-expressions is not an ML equality type (since it includes reals at the leaves) so you will need to implement equality checking in your own recursive function.

    The remaining relational operators are defined only on numbers and, separately, on strings.

  4. Suppose we want to create a list out of the values stored in the variables a and b. We can't write '(a b) since the quote blocks the evaluation of the variable names to their respective values. Instead we have to write (cons a (cons b ())) which is pretty cumbersome. Therefore, Lisp defines the function list which takes an arbtrary number of arguments, evaluates them, and then builds a list out of the results. I.e.:
    > (define a 3)
    a <- 3
    > (define b 4)
    b <- 4
    > '(a b)
    (a b)
    > (list a b)
    (3 4)
         

    Add the definition of list to your interpreter.

  5. Suppose we wish to extract the 3 from the s-expression ((0 1) (2 3)) which is stored in the variable a. This requires typing the incredibly horrible expression (car (cdr (car (cdr a)))). Because Lisp does not have pattern matching it is often necessary to write such expressions for extracting parts of a structure.

    To simplify such expressions, most Lisp implementations allow you to abbreviate any combination of car and cdr expressions by simply typing the corresponding combination of as and ds, surrounded by a c and an r. So, for instance, the example expression becomes (cadadr a) which, while still hard to figure out, is at least easy to type.

    In the binary search tree example above, we could now define the tree accessor functions as:

    (defun (left-subtree tree) (cadr tree))
    (defun (right-subtree tree) (caddr tree))

    Add an implementation of this shorthand to your interpreter. That is, make the interpreter able to understand any such abbreviation of car and cdr expressions. Note that the syntax converter has already been modified to recognize these as unary built-in operators.

What to Submit:

 

As with the last project, only the evaluator and the syntax converter get appreciable changes in this phase. And, as before, the changes to the syntax converter and user interface have already been made for you. Therefore, you should submit just the new implementation of EvaluatorFun.

The make file in the /cs/cs131/proj_7 directory can be used to test-build your solution. A file of test lisp expressions is also available in that directory.


This page copyright ©2000 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Fri, March 31, 2000.
http://cs.hmc.edu/~hodas/courses/cs131/projects/project07.html