|
|
|
Purpose: |
|
|
|
In this project you will begin the construction of the interpreter for a simple calculator language supporting typical mathematical operations, storage in named variables, and user-defined functions. This interpreter will be built over a series of three projects. In the first two phases you will implement a simplified version of the language supporting only numerical computation. In the third phase you will enrich the language with a string data type, as well as conditional evaluation and other features. In later projects you will further advance the language by adding modern data structures and scoping rules. Throughout this development process, the code for the interpreter will be divided into three major parts, each implemented by its own SML functor:
In this first phase of constructing the interpreter you
will build initial versions of the second and third of these
modules. |
The Concrete Syntax of the Language |
|
|
|
At the input prompt, the user may type either an expression, or a command. If they type an expression, it is simply evaluated and the result printed. If they type a command, then the command is executed, and the user is returned to the input prompt. The command, however, may effect the context in which future expressions and commands are processed (by, for example, defining new functions that can be used in expressions). |
|
Expressions |
Expressions have the following syntax: binary-op ::= "+" | "-" | "*" | "/"
exp-list ::= epsilon
| exp exp-list
exp ::= integer
| real
| id
| "(" binary-op exp exp ")"
| "(" id exp-list ")"
Thus they are either numeric literals, identifiers (i.e.
variables), the application of a built-in prefix binary
operator to a pair of expressions, or the application of a
user defined operator to a (possibly empty) list of
arguments ( Identifiers are constructed of one or more characters
from a rich set of characters (as specified in
The following are all valid expressions: 3 a (+ 3 2) (* a (+ 4 b)) (add1 -3.34) (power 3 -4) (power -3 (sqrt (+ 3 c))) (bigsum 3.4 5 -2 45 13) (- + *) Because variable identifiers and function identifiers
live in entirely different name spaces, in this last example
the meaning is the function - applied to the
variables + and *, where those variables
are just like any other variables holding numeric
values. |
|
Commands |
There are four commands, all of which can be used only at
the interpreter's prompt. They are Anything typed at the prompt that does not match one of these commands is assumed to be an expression to be evaluated and its result printed. Thus the syntax for acceptable inputs at the prompt is: input ::= exp
| "(" "exit" ")"
| "(" "bindings" ")"
| "(" "define" id exp ")"
| "(" "defun" "(" id-list ")" exp ")"
Note that the four commands are reserved words, and should not be allowed to occur other than in their use as commands at the top-level interpreter. They cannot be reused as variable identifiers, etc. The following are valid commands (in the sense that they can be typed at the prompt: (+ 2 3) (exit) (bindings) (define x 3) (define a (+ 4 b)) (define b (power -3 (sqrt (+ 3 c)))) (defun (add x y) (+ x y)) (defun (bigsum a b c d e) (+ a (+ b (+ c (+ d e))))) (defun (- x y) (power x y)) While it is acceptable (as shown in the last example above) to give new definitions to the built-in operators, since the syntax specification states that they are binary operators, any such redefinition must still be as a binary function. |
|
The Provided Parser |
The syntax of expressions and commands in this language is built on what are commonly called s-expressions. An s-expression is a recursively defined structure that is either a symbol (textual or numerical) or a sequence of s-expressions separated by white space and enclosed in parentheses. You are provided with a structure, SexpParser (of signature SEXPPARSER), which includes the function parseStringForSexp, which takes a string and converts it to an SML data structure of type sexp, which type is defined in the CONCSYN signature discussed below. The complete signature for the parser is as follows: signature SEXPPARSER = sig type sexp; exception ParseEOF exception ParseError of string val parseStringForSxpl : string -> sexp list val parseStringForSexp : string -> sexp val parseFile : string -> sexp list val parseStreamForSexp : TextIO.instream -> sexp end (* signature SEXPPARSER *) The ParseEOF exception is raised if the string being parsed is empty. The ParseError exception is raised if the string is not a valid s-expression (for example, if it contains unbalanced parentheses). |
The Abstract Syntax of the Language |
|
|
|
As we discuss in class, there are various advantages to separating the concrete syntax which the user uses for typing in programs, from the abstract syntax which captures the essential structures of the language. This is particularly true because abstract syntax is very easy to represent with SML data structures. For example, with the expression Cappl (Operator "+", [Cappl (Operator "*", [Cdata (Int 3), Cdata (Int 4)]), Cdata (Int 2)]) Similarly, the essence of the command Cdefun (Operator "add1", [Vname "x"], Cappl (Operator "+", [Cvar (Vname "x"),Cdata (Int 1)])) Expressions and commands are represented after conversion by different ML data types: cexpression and ccommand (for "Calculator Expression" and "Calculator Command"). Because it is necessary for the syntax converter to return just a single type, and the user can type either an expression or a command at the input, the user input is always converted to a ccommand, but one of the cases for that type is just a wrapper for an underlying cexpression, as will be seen below. The full abstract syntax for the language is given by the following signature: signature ABSYN = sig datatype vname = Vname of string; datatype operator = Operator of string type datavalue datatype cexpression = Cdata of datavalue | Cvar of vname | Cappl of operator * cexpression list datatype ccommand = Cexpression of cexpression | Cdefine of vname * cexpression | Cdefun of operator * vname list * cexpression | Cbindings | Cexit val datavalueToString : datavalue -> string val cexpressionToString : cexpression -> string end; Notice that the abstract syntax stops at the point of specifying the datavalues. We will use different datavalue definitions as our language grows and gets richer over the next few projects. For now we will just be using numbers, which you can assume are defined by the data type datatype number = Int of int | Real of real The number type is defined as part of a signature called CONCSYN, which provides other bits you don't need to worry about in this phase. Your functors will, however, need to take a structure of signature CONCSYN in order to get the definition for the type number. You'll notice that the ABSYN signature fully specifies the implementation details of many of the types. This is necessary as we will need to be able to access internals of the data structures in many of the routines making use of structures of this signature. The alternative would have been to leave the type less specified and instead to specify a large set of accessor and constructor functions, but that would have been fairly cumbersome. |
|
The Provided Syntax Converter |
For this phase of the development, in addition to the parser which converts strings to a concrete representation in the type sexp, you will also be given a structure SyntaxConverter (of signature SYNTAXCONVERTER) containing functions to convert from concrete syntax (stored in an sexp) to abstract syntax (stored in a ccommand or cexpression). Thus, most of the functions you write for this phase will only be manipulating structures in the abstract syntax data type. In the next phase you will build the syntax conversion functions for yourself. The complete signature for the syntax converter is as follows: signature SYNTAXCONVERTER = sig type sexp; type cexpression; type ccommand; exception SyntaxError of string; exception InternalError of string; val convertExpression : sexp -> cexpression; val convertCommand : sexp -> ccommand; end; The exception SyntaxError is raised if the given s-expression is not a valid cexpression or ccommand (as appropriate). The exception InternalError should not occur in practice, but must be handled just in case. |
The Evaluator |
|
|
|
The first of your two main tasks for this phase is to
supply a functor which exports a single function:
Of course, since expressions may refer to global
variables, functions, and (when evaluating the application
of a user-defined function) parameters, the evaluator will
need access to all these things. Therefore, the full type
for signature EVALUATOR = sig type cexpression; type vname; type datavalue; type operator; exception RuntimeError of string; exception InternalError of string; val evaluateCexpression : (vname * datavalue) list -> (operator * vname list * cexpression) list -> cexpression -> datavalue; end; The first parameter of You should implement this signature using a functor with the following heading: functor EvaluatorFun(structure AbstractSyntax : ABSYN structure ConcreteSyntax : CONCSYN sharing type AbstractSyntax.datavalue = ConcreteSyntax.number) = struct type cexpression = AbstractSyntax.cexpression type datavalue = AbstractSyntax.datavalue type vname = AbstractSyntax.vname type operator = AbstractSyntax.operator The concrete syntax structure is needed because it holds the definitions of the constructors used for datavalues (numbers). |
|
Applying Built-In Operators |
Whenever you encounter an application of a built-in operator you must:
Note that the language supports both integer and real
values but that the operators are untyped. Your evaluator
should be written to keep integers as integers for as long
as possible, but to automatically cast them to reals when
they are paired with a real as the other operand of some
built-in operator. You do not need to test whether a real
has any decimal part. So, for instance, In the calculator language, division by zero should
always be a runtime error. However, due to SML 96 using IEEE
floating point for real arithmetic, while integer division
by 0 will raise the exception Note, you may find it useful to define a function like: val applyBuiltIn = fn : operator -> datavalue list -> datavalue (where the second parameter will always be a list of length two), which implements all the basic arithmetic. Recall that, as mentioned above, the binary operators are not keywords, and the user may define new binary functions with the same names, and that these new definitions take precedence over the original definitions. |
|
Applying User-Defined Functions |
Whenever you encounter an application where the operator is not one of the built-in binary operators, you must:
Note that the number of formal parameters must match the
number of expressions sent as actual parameters. If not, a
Note that whenever you look up the definition of a user-defined function, you search sequentially from the beginning of the list of function definitions. This means that new function definitions override old definitions for the same name, even in old usage. So, for example, if we define the functionf, then define the function g in terms of f if we then redefinef, the function g will make use of that new definition. This behavior differs from the static (lexical) scope used in SML. |
The User Interface |
|
|
|
The other part you must write for this project is the actual user interface for the interpreter. This should be in the form of a functor implementing the following signature: signature USERINTERFACE = sig val topLevel : unit -> unit end; This one exported function should be implemented as a call to the actual read-eval-print loop which takes a list of variable definitions and a list of function definitions as parameters. This function will look similar to the read-eval-print loop function from project 3, but with the two definition lists taking the place of the operand stack. In the initial call to this loop function from
topLevel, these two lists are, obviously, empty.
When the user enters a When called, this function should display a prompt and
get a line of input from the user using the SML built-in
function If the result was a If the user redefines a variable or function, it is not
necessary to remove or change the old definition. Simply
attach the new definition to the front of the list and make
sure that when you search for definitions you take the first
one you encounter. This also makes it possible for formal
parameters to shadow the defintions of global variables
during the evaluation of a function application. The
implement ation of the Note that the Your interpreter should be prepared to handle syntax and internal errors raised by both the parser and the syntax converter, as well as runtime and internal errors produced by your evaluator. The functor you write to solve this part should have the following header: functor UserInterfaceFun(structure AbstractSyntax : ABSYN; structure ConcreteSyntax : CONCSYN; sharing type ConcreteSyntax.number = AbstractSyntax.datavalue; structure SexpParser : SEXPPARSER; structure SyntaxConverter : SYNTAXCONVERTER; sharing type AbstractSyntax.ccommand = SyntaxConverter.ccommand; sharing type SexpParser.sexp = SyntaxConverter.sexp; structure Evaluator : EVALUATOR; sharing type Evaluator.datavalue = AbstractSyntax.datavalue; sharing type Evaluator.cexpression = AbstractSyntax.cexpression; sharing type Evaluator.vname = AbstractSyntax.vname; sharing type Evaluator.operator = AbstractSyntax.operator;) = (Note, if you want to use the function
sharing type ConcreteSyntax.sexp = SyntaxConverter.sexp in the functor header above.) |
What I have Provided: |
|
|
|
I have created a version of the SML-NJ interpreter with
the parser structure (I have also loaded a sample solution to the evaluator functor and built a structure called testEvaluator as well. You can use the functions in this structure from the SML command line to see what the results of evaluation should be.) I have also provided an executable sample solution at
We will test your solution (which should be a single file
containing both functors and nothing else) by compiling it
with the file
|
How To Proceed |
|
|
|
The best plan of attack for this phase is to proceed incrementally. Begin by building your read-eval-print loop as a function from unit to unit, and write the evaluator so that it only deals with expressions built from numeric literals and the built-in operators. Once this works entirely, modify the types of the
read-eval-print loop and the evaluator so that they take the
list of variable definitions. Implement the functionality
for the Finally, integrate the parameter for the list of function
definitions, and implement the If you follow this course you are much more likely to
complete the whole assignment. And, if you don't complete
the whole assignment, you are more likely to have a partial
solution that works, at least for those features it tries to
implement. If you try to attack the entire problem at once,
you are more likely to end up with a submission that doesn't
do anything right (including compile). |
|
|
This page copyright ©2000 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Monday, February 21, 2000. |
http://www.cs.hmc.edu/~hodas/courses/cs131/projects/project05.html |
|