Project 05

Interpreter for a Simple Calculator Language
Phase 02 -- From Concrete to Abstract Syntax

Due by 5:00 PM on Friday, March 10, 2000

 

Purpose:

 

In this phase of development of your interpreter for a simple calculator language, you will build the module which converts from the concrete syntax of the language (as typed by the user and returned by the parser) to the abstract syntax (which you manipulate in your evaluator).

The Concrete Syntax of the Language

 

As explained in the last project, the calculator language is divided into two parts: expressions and commands.

Expressions

Expressions have the following syntax:

    binary-op ::= "+" | "-" | "*" | "/"
 
    exp-list ::= epsilon
               | exp exp-list
 
    exp ::= integer
          | real
          | id
          | "(" binary-op exp exp ")"
          | "(" id exp-list ")"
         

Thus they are either numeric literals, identifiers (i.e. variables), the application of a built-in prefix binary operator to a pair of expressions, or the application of a user defined operator to a (possibly empty) list of arguments (epsilon is the empty input). The application of a built-in or user-defined operator to a list of arguments is denoted by wrapping the application in parentheses (which are not optional).

Identifiers are constructed of one or more characters from a rich set of characters (as specified in /cs/cs131/proj_5/sexp-parser/sexp_lex) with no restriction on order of characters. Numbers are written with optional leading unary minus (written with a dash, not tilde) and optional decimal part. (if a decimal point is used, there must be digits before and after the point.) Because of the free syntax of identifiers, spacing is important in source programs. Running identifiers or numbers together with operators will likely lead to a valid identifier, rather than an application. For example (-3e) is the application of the identifier -3e to no arguments; it is not the same as the expression (- 3 e), which subtracts e from 3.

The following are all valid expressions:

3 
a 
(+ 3 2) 
(* a (+ 4 b)) 
(add1 -3.34) 
(power 3 -4)
(power -3 (sqrt (+ 3 c)))
(bigsum 3.4 5 -2 45 13)
(- + *) 
            

Because variable identifiers and function identifiers live in entirely different name spaces, in this last example the meaning is the function - applied to the variables + and *, where those variables are just like any other variables holding numeric values.

Commands

There are four commands, all of which can be used only at the interpreter's prompt. They are exit, which closes the interpreter, bindings, which causes the interpreter to display a list of the current variable and function bindings, define, which takes a variable name and an expression, evaluates the expresion and binds the variable to the result, and defun, which takes an operator name, a list of formal parameters, and an expression for the body of the function, and binds the operator name to that definition. All of the commands are used with enclosing parentheses.

Anything typed at the prompt that does not match one of these commands is assumed to be an expression to be evaluated and its result printed.

Thus the syntax for acceptable inputs at the prompt is:

    input ::= exp
            | "(" "exit" ")"
            | "(" "bindings" ")"
            | "(" "define" id exp ")"
            | "(" "defun" "(" id-list ")" exp ")"
         

Note that the four commands are reserved words, and should not be allowed to occur other than in their use as commands at the top-level interpreter. They cannot be reused as variable identifiers, etc.

The following are valid commands (in the sense that they can be typed at the prompt:

(+ 2 3)
(exit)
(bindings) 
(define x 3) 
(define a (+ 4 b)) 
(define b (power -3 (sqrt (+ 3 c)))) 
(defun (add x y) (+ x y))
(defun (bigsum a b c d e) (+ a (+ b (+ c (+ d e)))))
(defun (- x y) (power x y)) 
            

While it is acceptable (as shown in the last example above) to give new definitions to the built-in operators, since the syntax specification states that they are binary operators, any such redefinition must still be as a binary function.

The Provided Parser

As described above, the concrete syntax of expressions and commands in this language is written using parentheses, with operators coming in prefix position. This is the syntax of Lisp, and many of the languages that followed from it.

In fact, the valid expressions and commands of our language are just a subset of the valid parenthesized expressions constructed from numbers and symbols (identifiers). This syntax is generalized as the syntax of s-expressions. As explained in class, an s-expression is a recursively defined structure that is either a symbol (textual or numerical) or a sequence of s-expressions separated by white space and enclosed in parentheses.

In order to save you the hassle of working directly with strings and low-level parsing, I have built a parser for s-expressions and provided it for you to use. The parser is in the form of a structure, SexpParser (of signature SEXPPARSER), which includes the function parseStringForSexp, which takes a string and converts it to an SML data structure of type sexp, which type is defined in the CONCSYN signature discussed below.

The source code for the parser and support code is in /cs/cs131/proj_5/sexp-parser/. The file make_sml-sexp.sml in that directory was used to build the parser, and will show you the inter-relationships between the files.

The complete signature for the parser is as follows:

signature SEXPPARSER =
sig
 
    type sexp;
 
    exception ParseEOF
    exception ParseError of string
 
    val parseStringForSxpl : string -> sexp list
    val parseStringForSexp : string -> sexp
 
    val parseFile : string -> sexp list
 
    val parseStreamForSexp : TextIO.instream -> sexp
  
end  (* signature SEXPPARSER *)
            

The ParseEOF exception is raised if the string being parsed is empty. The ParseError exception is raised if the string is not a valid s-expression (for example, if it contains unbalanced parentheses).

Representation

In order to give you some experience with syntax manipulation, the provided parser does not convert strings directly into abstract syntax. Rather, as mentioned above, its specification is as:

SexpParser.parseStringForSexp : string -> sexp

where sexp is a type for s-expressions. You can think of this type as an intermediate representation somewhere between the concrete syntax of strings and the abstract syntax of calculator expressions. In fact it is really the abstract syntax of s-expressions.

Here is an example of the sort of structure generated by the parser:

- SexpParser.parseStringForSexp "(a b (c d) e)";
val it = Atom (Symbol "a") $
         Atom (Symbol "b") $
         (Atom (Symbol "c") $ Atom (Symbol "d") $ Atom Nil) $
         Atom (Symbol "e") $
         Atom Nil
         : SexpParser.sexp

 

Notice that all these structures are terminated by Atom Nil. So, for instance, the embedded sexpression (c d) was parsed as (Atom (Name "c") $ Atom (Name "d") $ Atom Nil). That's because, as we saw in class, the expression (c d) is really shorthand for (c . (d . ())), so the $ sign is really just a replacement for the dot in the concrete syntax. (Note that $ has been defined as a right associative infix constructor (like ::) to avoid alot of extraneous parentheses.) The only time you will get a series of expressions separated by $ and not terminated with Atom Nil is if the user has explicitely typed something using the dot notation for sexpressions and left out the terminating (). For now you can assume that this is always a syntax error (though later it will have legitimate uses).

As another example, the user-typed expression (+ (* 3 4) 2) will come back from the parser as:

         Atom (Symbol "+") $
         ( Atom (Symbol "*") $
           Atom (Number (Int 3)) $
           Atom (Number (Int 4)) $
           Atom Nil ) $
         Atom (Number (Int 2)) $
         Atom Nil
         

The full contents of concretesyntax.sig are:

signature CONCSYN =
sig
 
    datatype sexp =
        Atom of atom
      | $ of (sexp * sexp)
 
    and atom =
        Nil
      | Symbol of string
      | Number of number
 
    and number =
        Int of int
      | Real of real
 
    val numberToString : number -> string
    val atomToString : atom -> string
    val sexpToString : sexp -> string
 
end  (* signature CONCSYN *)

Notice that I have provided functions to build strings out of objects of any of the types. You should use these for displaying output or error messages.

 

The Abstract Syntax of the Language

 

As we saw in the last project, at the point of having to evaluate expressions it is much nicer if the user's input has already been transformed into an abstraction of the underlying expression, rather than having to deal with the concrete syntax directly.

For example, for the expression (+ (* 3 4) 2), which we saw above is represented by the sexp value:

         Atom (Symbol "+") $
         ( Atom (Symbol "*") $
           Atom (Number (Int 3)) $
           Atom (Number (Int 4)) $
           Atom Nil ) $
         Atom (Number (Int 2)) $
         Atom Nil
         

the following representation is much closer to the underlying meaning and therefore easier to manipulate:

         Cappl (Operator "+",
                [Cappl (Operator "*",
                        [Cdata (Int 3),
                         Cdata (Int 4)]),
                 Cdata (Int 2)])
         

Similarly, the essence of the command (defun (add1 x) (+ x 1)) is captured in the structure:

         Cdefun (Operator "add1",
                 [Vname "x"],
                 Cappl (Operator "+",
                        [Cvar (Vname "x"),Cdata (Int 1)]))
         

Expressions and commands are represented after conversion by different ML data types: cexpression and ccommand (for "Calculator Expression" and "Calculator Command"). Because it is necessary for the syntax converter to return just a single type, and the user can type either an expression or a command at the input, the user input is always converted to a ccommand, but one of the cases for that type is just a wrapper for an underlying cexpression, as will be seen below.

The full abstract syntax for the language is given by the following signature:

signature ABSYN =
sig
    datatype vname = Vname of string;
    datatype operator = Operator of string
 
    type datavalue
 
    datatype cexpression =
        Cdata of datavalue
      | Cvar of vname
      | Cappl of operator * cexpression list
 
    datatype ccommand =
        Cexpression of cexpression
      | Cdefine of vname * cexpression
      | Cdefun of operator * vname list * cexpression
      | Cbindings
      | Cexit
 
    val datavalueToString : datavalue -> string
    val cexpressionToString : cexpression -> string
end;
            

Notice that the abstract syntax stops at the point of specifying the datavalues. We will use different datavalue definitions as our language grows and gets richer over the next few projects. For now we will just be using numbers, which you can assume are defined by the data type

datatype number = Int of int
       | Real of real
            

The number type is defined as part of a signature called CONCSYN, which provides other bits you don't need to worry about in this phase. Your functors will, however, need to take a structure of signature CONCSYN in order to get the definition for the type number.

You'll notice that the ABSYN signature fully specifies the implementation details of many of the types. This is necessary as we will need to be able to access internals of the data structures in many of the routines making use of structures of this signature. The alternative would have been to leave the type less specified and instead to specify a large set of accessor and constructor functions, but that would have been fairly cumbersome.

This signature has been implemented by the following functor:

functor AbstractSyntaxFun (structure ConcreteSyntax : CONCSYN) =
struct
    type datavalue = ConcreteSyntax.number;
 
    datatype vname = Vname of string;
    datatype operator = Operator of string
 
    datatype cexpression =
        Cdata of datavalue
      | Cvar of vname
      | Cappl of operator * cexpression list
 
    datatype ccommand =
        Cexpression of cexpression
      | Cdefine of vname * cexpression
      | Cdefun of operator * vname list * cexpression
      | Cbindings
      | Cexit
 
    val datavalueToString = ConcreteSyntax.numberToString;
 
    fun cexpressionToString (Cdata data) = datavalueToString data
      | cexpressionToString (Cvar (Vname name)) = name
      | cexpressionToString (Cappl (Operator opr, cexps)) =
      "(" ^ opr ^ (cexpListToString cexps) ^ ")"
 
    and cexpListToString nil = ""
      | cexpListToString (ce::ces) =
       " " ^ cexpressionToString ce ^ cexpListToString ces
end;

         
         

Note that in order to get access to the number datatype defined in the concrete syntax implementation, the concrete syntax structure is taken as an argument to this functor.

What You Must Write:

 

Your task for this phase is to supply a functor which converts from the (almost) concrete syntax of sexpressions (as represented in a value of type sexp) to the abstract syntax of "calculator expressions" (as represented in a value of type ccommand or cexpression).

The Syntax Converter

You must write a functor named SyntaxConverter which implements the following signature:


         
         
signature SYNTAXCONVERTER =
sig
    type sexp;
    type cexpression;
    type ccommand;
 
    exception SyntaxError of string;
    exception InternalError of string;
 
    val convertExpression : sexp -> cexpression;
    val convertCommand : sexp -> ccommand;
end;
         

The interpreter will pass convertCommand the results of parsing whatever the user types at the prompt. The function checks whether it is a command (starting with exit, bindings, define, defun) and if so that it is well-formed. If so, it builds the appropriate abstract syntax for the command (using a call to convertExpression on the expression in the body of a define or defun). If it is not a command (well- or ill-formed) then it is just passed to convertExpression and the result wrapped with Cexpression.

The function convertExpression isn't actually needed from outside and need not have been exported by the signature, but I included it so that I could talk about it.

The functor you write to solve this phase should have the following overall shape:


         
         
functor SyntaxConverterFun(structure ConcreteSyntax : CONCSYN;
                           structure AbstractSyntax : ABSYN;
                              sharing type ConcreteSyntax.number =
                                           AbstractSyntax.datavalue)=
struct
         
   ...
         
end
         

The first few lines (after the struct) should define the required types in terms of the types provided by the structures given to the functor as arguments.

The file you turn in should contain just this functor, and no other code.

What I have Provided:

 

I have created a version of the SML-NJ interpreter with the parser structure SexpParser and the two syntax structures ConcreteSyntax and AbstractSyntax, and the signature SYNTAXCONVERTER already loaded. It also includes the structure Evaluator and the functor UserInterfaceFun from the solution to project 04. Once you have written your implementation of SyntaxConverterFun and build a structure from it, you can use UserInterfaceFun to build a complete interpreter. The executable (or a link to it) is in /cs/cs131/bin/sml-sexp. The file /cs/cs131/bin/sml-cs131 (which is the version loaded by emacs) has been re-linked to this version.

The executable also contains a structure SampleSyntaxConverter built from the sample solution. You can use this to see what the outputs from convertCommand and convertExpression should be for various inputs. To make using it at the SML prompt easier, I have also exported the following names at the top level:

val parse = SexpParser.parseStringForSexp; 
val convExp = SampleSyntaxConverter.convertExpression; 
val convCom = SampleSyntaxConverter.convertCommand; 


This page copyright ©2000 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Wednesday, Mar 1, 2000.
http://www.cs.hmc.edu/~hodas/courses/cs131/projects/project05.html