| |
Purpose: | |
|
In the next two projects you will build an interpreter for a simple
language. While it is really just a simple expression calculator, by
the time you're done it will contain the crucial elements of any
programming language: variables and functions.
In this first phase you will build the shell of the interpreter (the
read-eval-print loop in charge of all communication with the user),
and the module which converts from the concrete syntax of the language
(as typed by the user and returned by the parser) to the abstract
syntax (which you will manipulate in your evaluator in the next
phase).
| |
The Syntax of the Language | |
The calculator language is divided into two parts: expressions and commands.
| |
| Expressions |
Expressions have the following syntax:
binary-op ::= "+" | "-" | "*" | "/"
exp-list ::= epsilon
| exp exp-list
exp ::= integer
| real
| id
| "(" binary-op exp exp ")"
| "(" id exp-list ")"
Thus they are either numeric literals, identifiers (i.e. variables),
the application of a built-in prefix binary operator to a pair of
expressions, or the application of a user defined operator to a
(possibly empty) list of arguments (epsilon is the empty
input). The application of a built-in or user-defined operator to a
list of arguments is denoted by wrapping the application in
parentheses (which are not optional).
Identifiers are constructed of one or more characters from a rich set of
characters (as specified in
So, for example, the following are all valid expressions: 3 a (+ 3 2) (* a (+ 4 b)) (add1 -3.34) (power 3 -4) (bigsum 3.4 5 -2 45 13) (- + *)
|
| Commands |
There are four commands, all of which can be used only at the
interpreter's prompt. They are exit, which closes the
interpreter, bindings, which causes the interpreter to
display a list of the current variable and function bindings,
define, which takes a variable name and an expression,
evaluates the expresion, and binds the variable to the result, and
defun, which takes an operator name, a list of formal
parameters, and an expression for the body of the function, and binds
the operator name to that definition. All of the commands are used
with enclosing parentheses. Anything typed at the prompt that does not match one of these commands is assumed to be an expression to be evaluated and its result printed. Thus the syntax for acceptable inputs at the prompt is:
input ::= exp
| "(" "exit" ")"
| "(" "bindings" ")"
| "(" "define" id exp ")"
| "(" "defun" "(" id-list ")" exp ")"
Note that the four commands are reserved, and should not be allowed to occur other than in their use as commands at the top-level interpreter. They cannot be reused as variable identifiers, etc.
|
Abstract Syntax | |
|
When we get around to having to evaluate expressions in the next
project, it will be much nicer if the user's input has already been
transformed into an abstraction of the underlying expression, rather
than having to deal with the concrete syntax directly.
For example, with the expression
Cappl (Operator "+",
[Cappl (Operator "*",
[Cdata (Int 3),
Cdata (Int 4)]),
Cdata (Int 2)])
Similarly, the essence of the command (defun (add1 x) (+ x 1))
is captured in the structure:
Cdefun (Operator "add1",
[Vname "x"],
Cappl (Operator "+",
[Cvar (Vname "x"),Cdata (Int 1)]))
Your main task for this phase is to supply a functor which converts from the (almost) concrete syntax of sexpressions (as returned by the parser described below) to the abstract syntax of "calculator expressions". That full abstract syntax is given by the following signature (Notice that the abstract syntax stops at the point of specifying the datavalues. We will use different datavalue definitions as our language grows and gets richer over the next few projects. For now we will just be using numbers, and we will take the definition of numbers from the concrete syntax structure described below.):
signature ABSYN =
sig
datatype vname = Vname of string;
datatype operator = Operator of string
type datavalue
datatype cexpression =
Cdata of datavalue
| Cvar of vname
| Cappl of operator * cexpression list
datatype ccommand =
Cexpression of cexpression
| Cdefine of vname * cexpression
| Cdefun of operator * vname list * cexpression
| Cbindings
| Cexit
val datavalue_to_string : datavalue -> string
val cexpression_to_string : cexpression -> string
end;
You'll notice that the signature specifies the implementation details of alot of the types. This is necessary as we will need to be able to access internals of the dat structures in many of the routines making use of structures of this signature. The alternative would have been to leave the type less specified and specify a large set of accessor and constructor functions, but that would have been fairly cumbersome. This signature has been implemented by the following functor:
functor AbstractSyntaxFun (structure ConcreteSyntax : CONCSYN) =
struct
type datavalue = ConcreteSyntax.number;
datatype vname = Vname of string;
datatype operator = Operator of string
datatype cexpression =
Cdata of datavalue
| Cvar of vname
| Cappl of operator * cexpression list
datatype ccommand =
Cexpression of cexpression
| Cdefine of vname * cexpression
| Cdefun of operator * vname list * cexpression
| Cbindings
| Cexit
val datavalue_to_string = ConcreteSyntax.number_to_string;
fun cexpression_to_string (Cdata data) = datavalue_to_string data
| cexpression_to_string (Cvar (Vname name)) = name
| cexpression_to_string (Cappl (Operator opr, cexps)) =
"(" ^ opr ^ (cexp_list_to_string cexps) ^ ")"
and cexp_list_to_string nil = ""
| cexp_list_to_string (ce::ces) =
" " ^ cexpression_to_string ce ^ cexp_list_to_string ces
end;
Note that in order to get access to the number datatype
defined in the concrete syntax implementation, the concrete syntax
structure is taken as an argument to this functor.
| |
The Provided Parser: | |
|
As described above, the concrete syntax of this language is written
using parentheses, with operators coming in prefix position. This is
the syntax of Lisp, and many of the languages that followed from
it.
In fact, the valid expressions and commands of our language are just a
subset of the valid parenthesized expressions constructed from numbers
and symbols (identifiers). This syntax is generalized as the syntax
of s-expression, as explained in class. In order to save you the
hassle of working directly with strings and low-level parsing, I have
built a parser for s-expressions and provided it for you to use. The
source code for the parser and support code is in
The only files from the parser that you really need to understand are
signature CONCSYN =
sig
datatype sexp =
Atom of atom
| $ of (sexp * sexp)
and atom =
Nil
| Symbol of string
| Number of number
and number =
Int of int
| Real of real
val number_to_string : number -> string
val atom_to_string : atom -> string
val sexp_to_string : sexp -> string
end (* signature CONCSYN *)
These types have been or will be explained in class. Notice that I have provided functions to build strings out of objects of any of the types. You should use these for displaying output or error messages.
The second file,
signature SEXPPARSER =
sig
type sexp;
exception Parse_EOF
exception Parse_Error of string
val parse_string_for_sxpl : string -> sexp list
val parse_string_for_sexp : string -> sexp
val parse_file : string -> sexp list
val parse_stream_for_sexp : TextIO.instream -> sexp
end (* signature SEXPPARSER *)
For your purposes, the most important function exported by the
- SexpParser.parse_string_for_sexp "(a b (c d) e)";
val it = Atom (Symbol "a") $
Atom (Symbol "b") $
(Atom (Symbol "c") $ Atom (Symbol "d") $ Atom Nil) $
Atom (Symbol "e") $
Atom Nil
: SexpParser.sexp
| |
Notice that all these structures are terminated by Atom
Nil. So, for instance, the embedded sexpression (c
d) was parsed as (Atom (Name "c") $ Atom (Name
"d") $ Atom Nil). Also, note that $ has
been defined as a right associative infix constructor (like
::). The only time you will get a series of expressions
separated by $ and not terminated with Atom
Nil is if the user has typed something using the dot notation
for sexpressions. This will always be a syntax error (until we start
accepting sexpressions as data a couple of projects from now).
If the entire string does not form a valid s-expression then the
parser will raise the exception
So for example, the user-typed expression
Atom (Symbol "+") $
( Atom (Symbol "*") $
Atom (Number (Int 3)) $
Atom (Number (Int 4)) $
Atom Nil ) $
Atom (Number (Int 2)) $
Atom Nil
This is the expression that you will then convert to the abstract
representation:
Cappl (Operator "+",
[Cappl (Operator "*",
[Cdata (Int 3),
Cdata (Int 4)]),
Cdata (Int 2)])
| |
What You Must Write: | |
|
Your assignment is to provide functors implementing two signatures:
| |
| The Syntax Converter |
You must write a functor named SyntaxConverter which
implements the following signature:
signature SYNTAXCONVERTER =
sig
type sexp;
type cexpression;
type ccommand;
exception Syntax_Error of string;
exception Internal_Error of string;
val convert_expression : sexp -> cexpression;
val convert_command : sexp -> ccommand;
end;
The key part is that it should implement functions which, given an
sexpression structure as created by the parser, convert it into a
The idea is that your interpreter will pass
The function The functor you write to solve this part should have the following overall shape:
functor SyntaxConverterFun(structure ConcreteSyntax : CONCSYN;
structure AbstractSyntax : ABSYN;
sharing type ConcreteSyntax.number =
AbstractSyntax.datavalue) =
struct
...
end
The first few lines (after the
|
| The Interpreter |
The other part you must write for this project is the actual
interpreter that interacts with the user. This should be in the form
of a functor implementing the following signature:
signature INTERPRETER =
sig
val top_level : unit -> unit
end;
In order to make subsequent projects easier, it would be best if you
implemented this function just as a call to another
If the result is a Your interpreter should be prepared to handle syntax and internal errors raised by both the parser and your syntax converter. The functor you write to solve this part should have the following header:
functor InterpreterFun(structure AbstractSyntax : ABSYN;
structure ConcreteSyntax : CONCSYN;
sharing type ConcreteSyntax.number =
AbstractSyntax.datavalue;
structure SexpParser : SEXPPARSER;
structure SyntaxConverter : SYNTAXCONVERTER;
sharing type AbstractSyntax.ccommand =
SyntaxConverter.ccommand;
sharing type SexpParser.sexp =
SyntaxConverter.sexp)
(Note, if you want to use the function
sharing type ConcreteSyntax.sexp =
SyntaxConverter.sexp;
before the last sharing constraint in the functor header above.)
We will test your solution (which should be a single file containing
both functors and nothing else) by compiling it with the file
|
What I have Provided: | |
I have created a version of the SML-NJ interpreter with the parser
structure SexpParser and the two syntax structures
ConcreteSyntax and AbstractSyntax already
loaded. The executable (or a link to it) is in
/cs/cs131/bin/sml-sexp. The file
/cs/cs131/bin/sml-cs131 (which is the version loaded by
emacs) has been re-linked to this version.
I have also provided an executable sample solution at
Finally, I have also created val parse = SexpParser.parse_string_for_sexp; val conv_exp = SyntaxConverter.convert_expression; val conv_com = SyntaxConverter.convert_command;
| |
This page copyright ©1999 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Wednesday, Mar 3, 1999 at 1:45 PM. | |
http://cs.hmc.edu/~hodas/courses/cs131/projects/project04.html | |