Project 04

Interpreter for a Simple Calculator Language
Phase 01 -- From Concrete to Abstract Syntax

Due by 5:00 PM on Thursday, October 21, 1999

 

Purpose:

  In the next two projects you will build an interpreter for a simple language. While it is really just a simple expression calculator, by the time you're done it will contain the crucial elements of any programming language: variables and functions. In this first phase you will build the shell of the interpreter (the read-eval-print loop in charge of all communication with the user), and the module which converts from the concrete syntax of the language (as typed by the user and returned by the parser) to the abstract syntax (which you will manipulate in your evaluator in the next phase).

The Syntax of the Language

 

The calculator language is divided into two parts: expressions and commands.

Expressions Expressions have the following syntax:
    binary-op ::= "+" | "-" | "*" | "/"
 
    exp-list ::= epsilon
               | exp exp-list
 
    exp ::= integer
          | real
          | id
          | "(" binary-op exp exp ")"
          | "(" id exp-list ")"
Thus they are either numeric literals, identifiers (i.e. variables), the application of a built-in prefix binary operator to a pair of expressions, or the application of a user defined operator to a (possibly empty) list of arguments (epsilon is the empty input). The application of a built-in or user-defined operator to a list of arguments is denoted by wrapping the application in parentheses (which are not optional).

Identifiers are constructed of one or more characters from a rich set of characters (as specified in /cs/cs131/proj_4/sexp-parser/sexp_lex) with no restriction on order of characters. Numbers are written with optional leading unary minus and optional decimal part. (if a decimal point is used, there must be digits before and after the point.) Because of the free syntax of identifiers, spacing is important in source programs. Running identifiers or numbers together with operators will likely lead to a valid identifier, rather than an application. For example (-3e) is the application of the identifier -3e to no arguments; it is not the same as the expression (- 3 e), which subtracts e from 3.

So, for example, the following are all valid expressions:

 
3 
a 
(+ 3 2) 
(* a (+ 4 b)) 
(add1 -3.34) 
(power 3 -4)
(bigsum 3.4 5 -2 45 13)
(- + *) 

Commands There are four commands, all of which can be used only at the interpreter's prompt. They are exit, which closes the interpreter, bindings, which causes the interpreter to display a list of the current variable and function bindings, define, which takes a variable name and an expression, evaluates the expresion, and binds the variable to the result, and defun, which takes an operator name, a list of formal parameters, and an expression for the body of the function, and binds the operator name to that definition. All of the commands are used with enclosing parentheses.

Anything typed at the prompt that does not match one of these commands is assumed to be an expression to be evaluated and its result printed.

Thus the syntax for acceptable inputs at the prompt is:

    input ::= exp
            | "(" "exit" ")"
            | "(" "bindings" ")"
            | "(" "define" id exp ")"
            | "(" "defun" "(" id-list ")" exp ")"

Note that the four commands are reserved, and should not be allowed to occur other than in their use as commands at the top-level interpreter. They cannot be reused as variable identifiers, etc.

Abstract Syntax

  When we get around to having to evaluate expressions in the next project, it will be much nicer if the user's input has already been transformed into an abstraction of the underlying expression, rather than having to deal with the concrete syntax directly.

For example, with the expression (+ (* 3 4) 2) the following representation is much closer to the underlying meaning and therefore easier to manipulate:

         Cappl (Operator "+",
                [Cappl (Operator "*",
                        [Cdata (Int 3),
                         Cdata (Int 4)]),
                 Cdata (Int 2)])
Similarly, the essence of the command (defun (add1 x) (+ x 1)) is captured in the structure:
         Cdefun (Operator "add1",
                 [Vname "x"],
                 Cappl (Operator "+",
                        [Cvar (Vname "x"),Cdata (Int 1)]))

Your main task for this phase is to supply a functor which converts from the (almost) concrete syntax of sexpressions (as returned by the parser described below) to the abstract syntax of "calculator expressions". That full abstract syntax is given by the following signature (Notice that the abstract syntax stops at the point of specifying the datavalues. We will use different datavalue definitions as our language grows and gets richer over the next few projects. For now we will just be using numbers, and we will take the definition of numbers from the concrete syntax structure described below.):

signature ABSYN =
sig
    datatype vname = Vname of string;
    datatype operator = Operator of string
 
    type datavalue
 
    datatype cexpression =
        Cdata of datavalue
      | Cvar of vname
      | Cappl of operator * cexpression list
 
    datatype ccommand =
        Cexpression of cexpression
      | Cdefine of vname * cexpression
      | Cdefun of operator * vname list * cexpression
      | Cbindings
      | Cexit
 
    val datavalue_to_string : datavalue -> string
    val cexpression_to_string : cexpression -> string
end;

You'll notice that the signature specifies the implementation details of alot of the types. This is necessary as we will need to be able to access internals of the dat structures in many of the routines making use of structures of this signature. The alternative would have been to leave the type less specified and specify a large set of accessor and constructor functions, but that would have been fairly cumbersome. This signature has been implemented by the following functor:

functor AbstractSyntaxFun (structure ConcreteSyntax : CONCSYN) =
struct
    type datavalue = ConcreteSyntax.number;
 
    datatype vname = Vname of string;
    datatype operator = Operator of string
 
    datatype cexpression =
        Cdata of datavalue
      | Cvar of vname
      | Cappl of operator * cexpression list
 
    datatype ccommand =
        Cexpression of cexpression
      | Cdefine of vname * cexpression
      | Cdefun of operator * vname list * cexpression
      | Cbindings
      | Cexit
 
    val datavalue_to_string = ConcreteSyntax.number_to_string;
 
    fun cexpression_to_string (Cdata data) = datavalue_to_string data
      | cexpression_to_string (Cvar (Vname name)) = name
      | cexpression_to_string (Cappl (Operator opr, cexps)) =
      "(" ^ opr ^ (cexp_list_to_string cexps) ^ ")"
 
    and cexp_list_to_string nil = ""
      | cexp_list_to_string (ce::ces) =
       " " ^ cexpression_to_string ce ^ cexp_list_to_string ces
end;
Note that in order to get access to the number datatype defined in the concrete syntax implementation, the concrete syntax structure is taken as an argument to this functor.

The Provided Parser:

  As described above, the concrete syntax of this language is written using parentheses, with operators coming in prefix position. This is the syntax of Lisp, and many of the languages that followed from it.

In fact, the valid expressions and commands of our language are just a subset of the valid parenthesized expressions constructed from numbers and symbols (identifiers). This syntax is generalized as the syntax of s-expression, as explained in class. In order to save you the hassle of working directly with strings and low-level parsing, I have built a parser for s-expressions and provided it for you to use. The source code for the parser and support code is in /cs/cs131/proj_4/sexp-parser/. The file make_sml-sexp.sml in that directory was used to build the parser, and will show you the inter-relationships between the files.

The only files from the parser that you really need to understand are concretesyntax.sig which specifies the datatypes used to store the parsed s-expressions, and sexpparser.sig, which outlines the functions and exceptions exported by the parser. The contents of concretesyntax.sig are:

signature CONCSYN =
sig
 
    datatype sexp =
        Atom of atom
      | $ of (sexp * sexp)
 
    and atom =
        Nil
      | Symbol of string
      | Number of number
 
    and number =
        Int of int
      | Real of real
 
    val number_to_string : number -> string
    val atom_to_string : atom -> string
    val sexp_to_string : sexp -> string
 
end  (* signature CONCSYN *)

These types have been or will be explained in class. Notice that I have provided functions to build strings out of objects of any of the types. You should use these for displaying output or error messages.

The second file, sexpparser.sig, contains:

signature SEXPPARSER =
sig
 
    type sexp;
 
    exception Parse_EOF
    exception Parse_Error of string
 
    val parse_string_for_sxpl : string -> sexp list
    val parse_string_for_sexp : string -> sexp
 
    val parse_file : string -> sexp list
 
    val parse_stream_for_sexp : TextIO.instream -> sexp
  
end  (* signature SEXPPARSER *)

For your purposes, the most important function exported by the SexpParser structure is SexpParser.parse_string_for_sexp : string -> sexp, which, given a string, returns an intermediate representation of its contents using the datatype sexp. (I say ``intermediate representation'' because it is so close to the underlying concrete syntax that it doesn't really qualify as abstract syntax.) For example:

- SexpParser.parse_string_for_sexp "(a b (c d) e)";
val it = Atom (Symbol "a") $
         Atom (Symbol "b") $
         (Atom (Symbol "c") $ Atom (Symbol "d") $ Atom Nil) $
         Atom (Symbol "e") $
         Atom Nil
         : SexpParser.sexp

  Notice that all these structures are terminated by Atom Nil. So, for instance, the embedded sexpression (c d) was parsed as (Atom (Name "c") $ Atom (Name "d") $ Atom Nil). Also, note that $ has been defined as a right associative infix constructor (like ::). The only time you will get a series of expressions separated by $ and not terminated with Atom Nil is if the user has typed something using the dot notation for sexpressions. This will always be a syntax error (until we start accepting sexpressions as data a couple of projects from now).

If the entire string does not form a valid s-expression then the parser will raise the exception Parse_Error with a string argument. The exception Parse_EOF will (I believe) only be raised if the string contains only white space. Both these exceptions should be handled by your interpreter.

So for example, the user-typed expression (+ (* 3 4) 2) will come back from the parser as:

         Atom (Symbol "+") $
         ( Atom (Symbol "*") $
           Atom (Number (Int 3)) $
           Atom (Number (Int 4)) $
           Atom Nil ) $
         Atom (Number (Int 2)) $
         Atom Nil
This is the expression that you will then convert to the abstract representation:
         Cappl (Operator "+",
                [Cappl (Operator "*",
                        [Cdata (Int 3),
                         Cdata (Int 4)]),
                 Cdata (Int 2)])

What You Must Write:

  Your assignment is to provide functors implementing two signatures:

The Syntax Converter You must write a functor named SyntaxConverter which implements the following signature:
signature SYNTAXCONVERTER =
sig
    type sexp;
    type cexpression;
    type ccommand;
 
    exception Syntax_Error of string;
    exception Internal_Error of string;
 
    val convert_expression : sexp -> cexpression;
    val convert_command : sexp -> ccommand;
end;

The key part is that it should implement functions which, given an sexpression structure as created by the parser, convert it into a cexpression, and a ccommand respectively.

The idea is that your interpreter will pass convert_command the results of parsing whatever the user types at the prompt. The function checks whether it is a command (starting with exit, bindings, define, defun) and if so that it is well-formed. If so, it builds the appropriate abstract syntax for the command, using a call to convert_expression on the expression in the body of a define or defun. If it is not a command (well- or ill-formed) then it is just passed to convert_expression and the result wrapped with Cexpression.

The function convert_expression won't actually be needed from outside and need not have been exported by the signature, but I included it so that I could talk about it.

The functor you write to solve this part should have the following overall shape:

functor SyntaxConverterFun(structure ConcreteSyntax : CONCSYN;
                           structure AbstractSyntax : ABSYN;
                              sharing type ConcreteSyntax.number =
                                           AbstractSyntax.datavalue)  =
struct

   ...

end

The first few lines (after the struct) should define the required types in terms of the types provided by the structures given to the functor as arguments.

The Interpreter The other part you must write for this project is the actual interpreter that interacts with the user. This should be in the form of a functor implementing the following signature:
signature INTERPRETER =
sig
    val top_level : unit -> unit
end;

In order to make subsequent projects easier, it would be best if you implemented this function just as a call to another unit -> unit function. When called, that function should display a prompt and get a line of input from the user using the built-in TextIO.inputLine. You will also need to flush the output using TextIO.flushOut after displaying the prompt and before reading the input. The input should be passed to the parser and the result then passed through your syntax converter.

If the result is a Cbindings just print that that command is not implemented in this phase. If it is a Cdefine, a Cdefun, or a Cexpression, pass the contents to a function which echoes it back to the user. Use the functions provided in AbstractSyntax. Do not just print the original input string. In any of those four cases, you should then recursively call your read-eval-print function to go back for more input. If the result was a Cexit, just return (thereby exiting the system).

Your interpreter should be prepared to handle syntax and internal errors raised by both the parser and your syntax converter.

The functor you write to solve this part should have the following header:

functor InterpreterFun(structure AbstractSyntax : ABSYN;
                       structure ConcreteSyntax : CONCSYN;
                         sharing type ConcreteSyntax.number =
                                      AbstractSyntax.datavalue;
                       structure SexpParser : SEXPPARSER;
                       structure SyntaxConverter : SYNTAXCONVERTER;
                         sharing type AbstractSyntax.ccommand = 
                                      SyntaxConverter.ccommand;
                         sharing type SexpParser.sexp = 
                                      SyntaxConverter.sexp)

(Note, if you want to use the function sexpression_to_string to convert sexpressions to strings (for use in error messages, for instance) then you will also need to add the constraint:

                         sharing type ConcreteSyntax.sexp = 
                                      SyntaxConverter.sexp;

before the last sharing constraint in the functor header above.)

We will test your solution (which should be a single file containing both functors and nothing else) by compiling it with the file /cs/cs131/proj_4/make_sml-soln.sml with your file used in place of the two .sml files currently used.

What I have Provided:

  I have created a version of the SML-NJ interpreter with the parser structure SexpParser and the two syntax structures ConcreteSyntax and AbstractSyntax already loaded. The executable (or a link to it) is in /cs/cs131/bin/sml-sexp. The file /cs/cs131/bin/sml-cs131 (which is the version loaded by emacs) has been re-linked to this version.

I have also provided an executable sample solution at /cs/cs131/bin/sml-soln. You can use this to test how your solution should behave (from the standpoint of user interaction). This launches directly into my solutions interpreter.

Finally, I have also created /cs/cs131/bin/sml-synconv which is a version of SML with my SyntaxConverter structure loaded. You can use this to see what the outputs from convert_command and convert_expression should be for various inputs. To make using it easier, I have exported the following names at the top level:

 
val parse = SexpParser.parse_string_for_sexp; 
val conv_exp = SyntaxConverter.convert_expression; 
val conv_com = SyntaxConverter.convert_command; 


This page copyright ©1999 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Wednesday, Mar 3, 1999 at 1:45 PM.
http://cs.hmc.edu/~hodas/courses/cs131/projects/project04.html