Project 04

Interpreter for a Simple Calculator Language (Phase 1)
Evaluation and User Interface

Due by 5:00 PM on Tuesday, February 29, 2000

 

Purpose:

 

In this project you will begin the construction of the interpreter for a simple calculator language supporting typical mathematical operations, storage in named variables, and user-defined functions. This interpreter will be built over a series of three projects.

In the first two phases you will implement a simplified version of the language supporting only numerical computation. In the third phase you will enrich the language with a string data type, as well as conditional evaluation and other features. In later projects you will further advance the language by adding modern data structures and scoping rules.

Throughout this development process, the code for the interpreter will be divided into three major parts, each implemented by its own SML functor:

  • The Syntax Converter - This module contains routines to convert the user syntax of the language to a more abstract internal form that is easier to manipulate and process.

  • The Evaluator - This module contains routines that evaluate expressions represented in the internal syntax and return the appropriate result value.

  • The User Interface - This module contains routines which handle all communication with the user, and call on the routines in the other two modules to do most of the actual work.

In this first phase of constructing the interpreter you will build initial versions of the second and third of these modules.

The Concrete Syntax of the Language

 

At the input prompt, the user may type either an expression, or a command. If they type an expression, it is simply evaluated and the result printed. If they type a command, then the command is executed, and the user is returned to the input prompt. The command, however, may effect the context in which future expressions and commands are processed (by, for example, defining new functions that can be used in expressions).

Expressions

Expressions have the following syntax:

    binary-op ::= "+" | "-" | "*" | "/"
 
    exp-list ::= epsilon
               | exp exp-list
 
    exp ::= integer
          | real
          | id
          | "(" binary-op exp exp ")"
          | "(" id exp-list ")"
         

Thus they are either numeric literals, identifiers (i.e. variables), the application of a built-in prefix binary operator to a pair of expressions, or the application of a user defined operator to a (possibly empty) list of arguments (epsilon is the empty input). The application of a built-in or user-defined operator to a list of arguments is denoted by wrapping the application in parentheses (which are not optional).

Identifiers are constructed of one or more characters from a rich set of characters (as specified in /cs/cs131/proj_4/sexp-parser/sexp_lex) with no restriction on order of characters. Numbers are written with optional leading unary minus (written with a dash, not tilde) and optional decimal part. (if a decimal point is used, there must be digits before and after the point.) Because of the free syntax of identifiers, spacing is important in source programs. Running identifiers or numbers together with operators will likely lead to a valid identifier, rather than an application. For example (-3e) is the application of the identifier -3e to no arguments; it is not the same as the expression (- 3 e), which subtracts e from 3.

The following are all valid expressions:

3 
a 
(+ 3 2) 
(* a (+ 4 b)) 
(add1 -3.34) 
(power 3 -4)
(power -3 (sqrt (+ 3 c)))
(bigsum 3.4 5 -2 45 13)
(- + *) 
            

Because variable identifiers and function identifiers live in entirely different name spaces, in this last example the meaning is the function - applied to the variables + and *, where those variables are just like any other variables holding numeric values.

Commands

There are four commands, all of which can be used only at the interpreter's prompt. They are exit, which closes the interpreter, bindings, which causes the interpreter to display a list of the current variable and function bindings, define, which takes a variable name and an expression, evaluates the expresion and binds the variable to the result, and defun, which takes an operator name, a list of formal parameters, and an expression for the body of the function, and binds the operator name to that definition. All of the commands are used with enclosing parentheses.

Anything typed at the prompt that does not match one of these commands is assumed to be an expression to be evaluated and its result printed.

Thus the syntax for acceptable inputs at the prompt is:

    input ::= exp
            | "(" "exit" ")"
            | "(" "bindings" ")"
            | "(" "define" id exp ")"
            | "(" "defun" "(" id-list ")" exp ")"
         

Note that the four commands are reserved words, and should not be allowed to occur other than in their use as commands at the top-level interpreter. They cannot be reused as variable identifiers, etc.

The following are valid commands (in the sense that they can be typed at the prompt:

(+ 2 3)
(exit)
(bindings) 
(define x 3) 
(define a (+ 4 b)) 
(define b (power -3 (sqrt (+ 3 c)))) 
(defun (add x y) (+ x y))
(defun (bigsum a b c d e) (+ a (+ b (+ c (+ d e)))))
(defun (- x y) (power x y)) 
            

While it is acceptable (as shown in the last example above) to give new definitions to the built-in operators, since the syntax specification states that they are binary operators, any such redefinition must still be as a binary function.

The Provided Parser

The syntax of expressions and commands in this language is built on what are commonly called s-expressions. An s-expression is a recursively defined structure that is either a symbol (textual or numerical) or a sequence of s-expressions separated by white space and enclosed in parentheses.

You are provided with a structure, SexpParser (of signature SEXPPARSER), which includes the function parseStringForSexp, which takes a string and converts it to an SML data structure of type sexp, which type is defined in the CONCSYN signature discussed below.

The complete signature for the parser is as follows:

signature SEXPPARSER =
sig
 
    type sexp;
 
    exception ParseEOF
    exception ParseError of string
 
    val parseStringForSxpl : string -> sexp list
    val parseStringForSexp : string -> sexp
 
    val parseFile : string -> sexp list
 
    val parseStreamForSexp : TextIO.instream -> sexp
  
end  (* signature SEXPPARSER *)
            

The ParseEOF exception is raised if the string being parsed is empty. The ParseError exception is raised if the string is not a valid s-expression (for example, if it contains unbalanced parentheses).

The Abstract Syntax of the Language

 

As we discuss in class, there are various advantages to separating the concrete syntax which the user uses for typing in programs, from the abstract syntax which captures the essential structures of the language. This is particularly true because abstract syntax is very easy to represent with SML data structures.

For example, with the expression (+ (* 3 4) 2) the following representation is much closer to the underlying meaning and is therefore easier to manipulate:

         Cappl (Operator "+",
                [Cappl (Operator "*",
                        [Cdata (Int 3),
                         Cdata (Int 4)]),
                 Cdata (Int 2)])
         

Similarly, the essence of the command (defun (add1 x) (+ x 1)) is captured in the structure:

         Cdefun (Operator "add1",
                 [Vname "x"],
                 Cappl (Operator "+",
                        [Cvar (Vname "x"),Cdata (Int 1)]))
         

Expressions and commands are represented after conversion by different ML data types: cexpression and ccommand (for "Calculator Expression" and "Calculator Command"). Because it is necessary for the syntax converter to return just a single type, and the user can type either an expression or a command at the input, the user input is always converted to a ccommand, but one of the cases for that type is just a wrapper for an underlying cexpression, as will be seen below.

The full abstract syntax for the language is given by the following signature:

signature ABSYN =
sig
    datatype vname = Vname of string;
    datatype operator = Operator of string
 
    type datavalue
 
    datatype cexpression =
        Cdata of datavalue
      | Cvar of vname
      | Cappl of operator * cexpression list
 
    datatype ccommand =
        Cexpression of cexpression
      | Cdefine of vname * cexpression
      | Cdefun of operator * vname list * cexpression
      | Cbindings
      | Cexit
 
    val datavalueToString : datavalue -> string
    val cexpressionToString : cexpression -> string
end;
            

Notice that the abstract syntax stops at the point of specifying the datavalues. We will use different datavalue definitions as our language grows and gets richer over the next few projects. For now we will just be using numbers, which you can assume are defined by the data type

datatype number = Int of int
       | Real of real
            

The number type is defined as part of a signature called CONCSYN, which provides other bits you don't need to worry about in this phase. Your functors will, however, need to take a structure of signature CONCSYN in order to get the definition for the type number.

You'll notice that the ABSYN signature fully specifies the implementation details of many of the types. This is necessary as we will need to be able to access internals of the data structures in many of the routines making use of structures of this signature. The alternative would have been to leave the type less specified and instead to specify a large set of accessor and constructor functions, but that would have been fairly cumbersome.

The Provided Syntax Converter

For this phase of the development, in addition to the parser which converts strings to a concrete representation in the type sexp, you will also be given a structure SyntaxConverter (of signature SYNTAXCONVERTER) containing functions to convert from concrete syntax (stored in an sexp) to abstract syntax (stored in a ccommand or cexpression). Thus, most of the functions you write for this phase will only be manipulating structures in the abstract syntax data type. In the next phase you will build the syntax conversion functions for yourself.

The complete signature for the syntax converter is as follows:

signature SYNTAXCONVERTER =
sig
    type sexp;
    type cexpression;
    type ccommand;
 
    exception SyntaxError of string;
    exception InternalError of string;
 
    val convertExpression : sexp -> cexpression;
    val convertCommand : sexp -> ccommand;
end;
            

The exception SyntaxError is raised if the given s-expression is not a valid cexpression or ccommand (as appropriate). The exception InternalError should not occur in practice, but must be handled just in case.

The Evaluator

 

The first of your two main tasks for this phase is to supply a functor which exports a single function: evaluateCexpression. As its name implies, this function takes a cexpression and evaluates it, returning a datavalue (unless an error occurs, in which case a RuntimeError is raised).

Of course, since expressions may refer to global variables, functions, and (when evaluating the application of a user-defined function) parameters, the evaluator will need access to all these things. Therefore, the full type for evaluateCexpression is as given in the following signature:

signature EVALUATOR 
  = sig
      type cexpression;
      type vname;
      type datavalue;
      type operator;
            
      exception RuntimeError of string;
      exception InternalError of string;
            
      val evaluateCexpression : 
               (vname * datavalue) list -> 
                 (operator * vname list * cexpression) list -> 
                   cexpression -> 
                     datavalue;
    end;
         
            

The first parameter of evaluateCexpression is the current list of variable definitions. It holds all the global variables. If we are in the midst of evaluating the application of a function, it also holds the definitions of the formal parameters of the functions (ie, the formal parameter names paired with the actual parameter values). The second parameter is a list of the current function definitions. It only grows when a new function is defined at the top-level of the interpreter. The third parameter is the expression to be evaluated.

You should implement this signature using a functor with the following heading:

functor EvaluatorFun(structure AbstractSyntax : ABSYN
                     structure ConcreteSyntax : CONCSYN
                        sharing type AbstractSyntax.datavalue =
                                     ConcreteSyntax.number) =
struct
 
    type cexpression = AbstractSyntax.cexpression
    type datavalue = AbstractSyntax.datavalue
    type vname = AbstractSyntax.vname
    type operator = AbstractSyntax.operator

The concrete syntax structure is needed because it holds the definitions of the constructors used for datavalues (numbers).

Applying Built-In Operators

Whenever you encounter an application of a built-in operator you must:

  1. Evaluate each of the two expressions being used as actual parameters, reducing each to a datavalue.

  2. Perform the appropriate arithmetic on the two resulting values.

Note that the language supports both integer and real values but that the operators are untyped. Your evaluator should be written to keep integers as integers for as long as possible, but to automatically cast them to reals when they are paired with a real as the other operand of some built-in operator. You do not need to test whether a real has any decimal part. So, for instance, (+ 3.0 2) evaluates to 5.0. However, if one integer is divided by another it should result in a real result if necessary, but an integer if the division can be done correctly using integer division.

In the calculator language, division by zero should always be a runtime error. However, due to SML 96 using IEEE floating point for real arithmetic, while integer division by 0 will raise the exception Div, real division by 0.0 returns either inf (if the numerator is non-zero) or nan (if the numerator is 0.0) rather than raise an exception. Therefore, you should check for all attempts at division by zero yourself, and raise RuntimeError rather than attempt the division.

Note, you may find it useful to define a function like:

val applyBuiltIn = fn : operator -> datavalue list -> datavalue 

(where the second parameter will always be a list of length two), which implements all the basic arithmetic.

Recall that, as mentioned above, the binary operators are not keywords, and the user may define new binary functions with the same names, and that these new definitions take precedence over the original definitions.

Applying User-Defined Functions  

Whenever you encounter an application where the operator is not one of the built-in binary operators, you must:

  1. Look up the definition of the operator in the list of user-defined functions.

  2. Evaluate each of the expressions being used as actual parameters, reducing each to a datavalue.  

  3. Set up a list of bindings of the formal parameter names (from the function definition) to the values computed for the actual parameters in the last step.  

  4. Evaluate the expression for the body of the function, in an environment that includes the bindings set up in the last step.

Note that the number of formal parameters must match the number of expressions sent as actual parameters. If not, a RuntimeError should be raised. (This error is also raised if you ever need the value of a variable and can't find a defintion for it.)

Note that whenever you look up the definition of a user-defined function, you search sequentially from the beginning of the list of function definitions. This means that new function definitions override old definitions for the same name, even in old usage. So, for example, if we define the functionf, then define the function g in terms of f if we then redefinef, the function g will make use of that new definition. This behavior differs from the static (lexical) scope used in SML.

The User Interface

 

The other part you must write for this project is the actual user interface for the interpreter. This should be in the form of a functor implementing the following signature:

signature USERINTERFACE =
sig
    val topLevel : unit -> unit
end;

This one exported function should be implemented as a call to the actual read-eval-print loop which takes a list of variable definitions and a list of function definitions as parameters. This function will look similar to the read-eval-print loop function from project 3, but with the two definition lists taking the place of the operand stack.

In the initial call to this loop function from topLevel, these two lists are, obviously, empty. When the user enters a define command, a binding for the new variable (assigned the result of evaluating the expression they define it to) should be added to the front of the variable-definitions list in the recursive call. The function-definitions list is similarly augmented in the recursive call when a defun command is encountered.

When called, this function should display a prompt and get a line of input from the user using the SML built-in function TextIO.inputLine. You will also need to flush the output using TextIO.flushOut after displaying the prompt and before reading the input. The input string should be passed to the provided parser and the result then passed to the provided syntax converter function convertCommand.

If the result was a Cexit, just return (thereby exiting the system). If the result is a Cbindings print out a list of all the function and variable bindings. If it is a Cdefine evaluate the given expression, echo the resulting definition to the user, and bind the given variable to the result in the subsequent call to the loop. If it is a Cdefun, echo the definition to the user and go into the next loop with the list of function definitions augmented by this new definition. Finally, if the command is just a Cexpression, evaluate it and print the result.

If the user redefines a variable or function, it is not necessary to remove or change the old definition. Simply attach the new definition to the front of the list and make sure that when you search for definitions you take the first one you encounter. This also makes it possible for formal parameters to shadow the defintions of global variables during the evaluation of a function application. The implement ation of the bindings commandmust, of course, print only the active definitions. Definitions that have been shadowed by re-definitions should not be shown.

Note that the AbstractSyntax structure provides functions you can use in producing output for the user.

Your interpreter should be prepared to handle syntax and internal errors raised by both the parser and the syntax converter, as well as runtime and internal errors produced by your evaluator.

The functor you write to solve this part should have the following header:

functor UserInterfaceFun(structure AbstractSyntax : ABSYN;
                         structure ConcreteSyntax : CONCSYN;
                           sharing type ConcreteSyntax.number =
                                        AbstractSyntax.datavalue;
                         structure SexpParser : SEXPPARSER;
                         structure SyntaxConverter : SYNTAXCONVERTER;
                           sharing type AbstractSyntax.ccommand = 
                                        SyntaxConverter.ccommand;
                           sharing type SexpParser.sexp = 
                                        SyntaxConverter.sexp;
                         structure Evaluator : EVALUATOR;
                           sharing type Evaluator.datavalue = 
                                        AbstractSyntax.datavalue; 
                           sharing type Evaluator.cexpression = 
                                        AbstractSyntax.cexpression; 
                           sharing type Evaluator.vname = 
                                        AbstractSyntax.vname; 
                           sharing type Evaluator.operator = 
                                        AbstractSyntax.operator;) =
            

(Note, if you want to use the function sexpressionToString from the ConcreteSyntax structure to convert sexpressions to strings (for use in error messages, for instance) then you will also need to add the constraint:

                           sharing type ConcreteSyntax.sexp = 
                                        SyntaxConverter.sexp
            

in the functor header above.)

What I have Provided:

 

I have created a version of the SML-NJ interpreter with the parser structure SexpParser, the two syntax structures ConcreteSyntax and AbstractSyntax, and the SyntaxConverter structure, and the signatures for the userinterface and evaluator functors already loaded. The executable (or a link to it) is in /cs/cs131/bin/sml-sexp. The file /cs/cs131/bin/sml-cs131 (which is the version loaded by emacs) has also been re-linked to this version.

(I have also loaded a sample solution to the evaluator functor and built a structure called testEvaluator as well. You can use the functions in this structure from the SML command line to see what the results of evaluation should be.)

I have also provided an executable sample solution at /cs/cs131/bin/sml-soln-proj4. You can use this to test how your solution should behave (from the standpoint of user interaction). This launches directly into my solution's interpreter.

We will test your solution (which should be a single file containing both functors and nothing else) by compiling it with the file /cs/cs131/proj_5/make_sml_soln.sml with your file used in place of the two .sml files currently used.

How To Proceed

 

The best plan of attack for this phase is to proceed incrementally. Begin by building your read-eval-print loop as a function from unit to unit, and write the evaluator so that it only deals with expressions built from numeric literals and the built-in operators.

Once this works entirely, modify the types of the read-eval-print loop and the evaluator so that they take the list of variable definitions. Implement the functionality for the define command in the read-eval-print loop, and modify the evaluator so that it knows how to deal with variables in expressions.

Finally, integrate the parameter for the list of function definitions, and implement the defun command and modify the evaluator so that it knows how to apply user-defined functions.

If you follow this course you are much more likely to complete the whole assignment. And, if you don't complete the whole assignment, you are more likely to have a partial solution that works, at least for those features it tries to implement. If you try to attack the entire problem at once, you are more likely to end up with a submission that doesn't do anything right (including compile).


This page copyright ©2000 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Monday, February 21, 2000.
http://www.cs.hmc.edu/~hodas/courses/cs131/projects/project05.html