| |
Purpose: | |
|
In this project you will extend and modify the dynamically-scoped Lisp built in the last
project so that it becomes a lexically scoped Scheme interperter with first-class functions.
This will involve adding storage structures for functions as data and modifying the way in
which s-expressions are stored. Most importantly, it will involve implementing types for
environments and closures that behave in the desired way.
| |
Background: | |
| Lexical Scoping |
A major difference between this interpreter and the last is its use of lexical scope.
This requires a deeper interleaving of environments (i.e. lists of
identifier/value binding pairs) into other structures and routines.
For example, it should now be possible to catch all references to undeclared identifiers
at "compile" (i.e., syntax-conversion) time. But that means that the routines in SyntaxConverter
need access to the current environment. Whenever an identifier is encountered, the environment
needs to be checked to see if the identifier is defined. (And, when we are converting the
body of a function, we must treat the formal parameters as having definitions. This
requires augmenting the current environment with dummy bindings for those variables
when descending into a function.)
Don't Panic!!! You don't have to write this yourself.
I'm giving you a functioning Now, as we have discussed in class, in lexically-scoped languages, when you store a function definition you must include a pointer to the top of the environment active at the point of definition. When the function is applied to some arguments, the bindings of the formal parameters are added to this stored environment (rather than the current environment) to produce the environment in which the function's body is evaluated. If this were the only change we were making to the interpreter then it would simply involve adding a slot for an environment to the structure used to hold function definitions in the function definition list. Of course the stored environment would need to list both the variable and the function bindings.
|
| First- Class Functions |
However, the other task for this project is to make functions first-class data objects, which
will require additional, related changes. In one respect it will simplify the
interpreter: there will no longer be separate environments (name spaces) for
functions and variables. Instead there will only be variables, some of which
will have functions as their values.
|
| Unnamed Functions |
There will actually be two kinds of functions
in this system: named functions defined in the traditional way with defun, and
unnamed functions entered with the expression syntax:
Of course, it is possible to bind a variable name to an unnamed function by writing:(lambda (var_1 ... var_n) body) but, as we will see below, this sort of named function will not allow recursion, whereas those defined with(define name (lambda (var_1 ... var_n) body) defun will. (While both sorts of function definitions
will construct the same sort of data value when evaluated, the construction of that
value will happen in different ways, so that those resulting from defuns
allow recursion.)
In order to support the | Clambda of vname list * cexpression
When you encounter a
Now, if we were not worried about recursion, then we would let SML handle the details of
creating the pointer to the current environment for us, by just building an ordinary
data structure in this case for
| Cclosure of vname list * cexpression (* Formal params & body *)
* (vname * datavalue) list (* Current environment *)
The first two fields would store the information from the Clambda
being evaluated, and the third would hold the the current environment at the time the
definition is evaluated.
|
| Named Functions |
However, you will recall from class that in order to support recursion, the
closure constructed by a defun mus hold a slightly different
environment. Instead of just storing the current environment in the closure,
we must first add the binding of the function name to the closure to the current environment,
then put a pointer to that resulting environment in the closure. That way the
function's definition will be available for a recursive call.This requires being able to build a pointer structure with a loop in it (The environment pointed to by the current environment pointer contains a closure which contains a pointer to the current environment.). Unfortunately (or, perhaps, fortunately, since it avoids alot of potential errors when such loops are unintended), you can't produce such loops with ordinary SML data structures.
As we saw in the coverage of streams, a conceptual loop can be produced with a thunk (a delayed
or frozen object). They can also be created with
| Cclosure of vname list * cexpression (* Formal params & body *)
* (vname * datavalue) list ref (* Current environment *)
In the case of a evaluating a lambda, you just store a ref
to the current environment, and leave it at that. For a closure resulting from
evaluating a defun, we must construct the necessary loop.
|
| Tying The Knot |
While I won't tell you exactly what to do to produce the loop we need, I'll give you an
example of the general idea. Suppose we define the following data type:
We can construct some simple values of this type as in:datatype foo = bar | baz of int * foo ref Of course, if this is all we were going to do, there would be no reason to have put the- val a = bar; val a = bar : foo - val b = baz (1, ref bar); val b = baz (1,ref bar) : foo - val c = baz (1, ref (baz (2, ref bar))); val c = baz (1,ref (baz (2,ref bar))) : foo ref in the pair (we could just have used baz of int * foo).
But suppose we define the following values:
While the value of- val x = ref bar; val x = ref bar : foo ref - val y = baz (1, ref (baz (2, x))); val y = baz (1,ref (baz (2,ref bar))) : foo y looks the same as the value of c,
the difference is that the bar pointed to by the innermost ref is
not just any bar, it is the one in the cell pointed to by the ref
in x. So, if we change the value that is at the end of the ref
in x, it will change the one in y. For example, we could make the
following change:
The trick to creating the loop is to set the- x := baz (3, ref bar); val it = () : unit - y; val it = baz (1,ref (baz (2,ref (baz (3,ref bar))))) : foo ref in x to point
to y, thus creating a loop in y:
The bit with the percent sign is SML's (rather smart) way of avoiding going into a loop printing the value of- x:= y; val it = () : unit - y; val it = baz (1,ref (baz (2,ref (baz (1,%1)))) as %1) : foo y.
Paulson, the author of ML for the Working Programmer, refers to this process as
tying the knot.
Thus, when you build the
|
| Primitive Operators |
Actually, I lied. There aren't two kinds of functions in this system, there are three.
The last kind is used for the built-in operations of the language. To make functions fully
first-class, the built-ins need to be first class too. (Well, almost first class: there will not
be any way to create a new one within the language.) The built-in functions are now
refered to as primitive operators, or prim-ops for short.
There is no syntax for prim-ops since, as we said, you can't create new ones.
As data values they will be stored using the following case of the new datavalue type:
where the integer is the arity of the operator.| Cprimop of Operator * int
|
| Data Values |
We have alluded to the new type for data values several times already.
When we first started this project many weeks ago, the data values were
numbers. Then we graduated to numbers and strings, and then to s-expressions.
The important thing all these types had in common was that they were
subtypes of the types already being used in the concrete syntax of the language.
That is, they were all part of the underlying syntax. A number as typed by the
user (or at least as returned by the parser) is the same as a number as stored
in a variable.
Now, however, we have functions as data, and their evaluated values are
different from their syntactic representation (closures as opposed to functions). So
our existing types are inadequate.
Of course, we still need to be able to build pairs (i.e. s-expressions), but at the
leaves of the s-expressions (i.e. where the atoms used to be) are values that might be numbers,
strings, or symbols (i.e. anything in
So, the new type definition for
(Note that the leading
Whereas, during syntax translation in the last project, a quoted object just got wrapped in the
|
| A More General Form Of Applica- tion |
The last major effect of allowing functions to be first-class values is that, since they may now be
returned as the results of evaluation, the term in head-position in an application no longer needs
to, necessarily, be a named function (built-in or user-defined). Instead it may be an arbitrary
expression. Therefore, the case for applications in AbstractSyntax.cexpression becomes:
| Cappl of cexpression * cexpression list Evaluating an application is much like before, except that instead of just looking up the definition of the named function in the function list (assuming it wasn't a built-in), you take the term in head-position and evaluate it like any other expression. The result of that evaluation is then applied to the evaluated arguments. Of course, that means the term in the head can't really be any arbitrary expression. It has to evaluate down to one of the two kinds of "functions": a prim-op, or a closure. If not, a runtime error occurs. To reiterate, application is now a three step proccess:
apply function which
takes the head-position result and the list of argument-position results and pattern matches on the
type of datavalue in the head-position result, and does the appropriate application
(or exception raising) depending on the type.
|
| Primitive Operators Redux |
So that you don't have to include all the code for evaluating the primitive operators,
which is pretty much unchanged from the old code for built-ins, the project has been
restructured around a new functor called PrimOpsFun with the following signature:
The values
The function Note that the runtime arity checking is necessary because it cannot always easily be checked at syntax-conversion time, since you might pass a prim-op as an argument to another function in which it is used. For example, you might write: Then you could write:(defun (f g x y) (g x y)) as well as:(defun (add x y) (f + x y)) Without types to help us, there is no easy way to determine at syntax conversion (compile) time that the second definition should be rejected. Instead we will leave it till run-time to be discovered. (Obviously, the same runtime arity-check will need to be done for closure application as well.)(defun (bad x y) (f car x y)) (In fact, while it is still possible at this point to do some arity checking at syntax-conversion time, the syntax converter for this project does not do any. The reason is that the extra-credit portion of this project, described below, involves a change that effectively means there cannot be compile-time arity errors. Rather than produce two separate syntax converters for this project, it was decided to just do one which does not do any arity checking.) While this is the solution we have chosen for implementing the primitive operators, because it is conceptually simple, a more interesting solution that would make use of SML's higher-order features would be to define the datatype case for a primop as: | Cprimop of (datavalue list -> datavalue) * int
That is, we would store an actual ML function for the primop. That function would be one which
does the work of computing the value of that primop on the arguments it is given. Then we would
do away with the function
|
| The Initial Environment |
The value initial_environment in the signature is just that:
an environment that you should use as the
environment when you start up your interpreter. It includes bindings of all the built-in function
names to the appropriate prim-op structures. Thus there is no special treatment of built-in names.
(Well, almost none. See the next paragraph.)
When the user types (+ 2 3), syntax-conversion
just treats the "+" as a variable name (as it would any other identifier in that position),
and converts it to a Cvar. When you evaluate it, like any other variable you will
look it up in the environment, where you will find it bound to the appropriate prim-op,
which you will then apply to the evaluated arguments (if there are the right number of them).
Of course, this means that there is no problem redefining the primitive operators. They are just like
any other variables.
There is one exception to the no-special-treatment-for-buit-ins rule that you need to consider.
If you look at the definition of
By the way, in Scheme it was standard to name predicate (truth-valued) functions with a trailing
question mark. (Some versions of lisp used the name |
Your Job (Whether Or Not You Choose To Accept It): | |
After all that, there's not much to actually say here.
You are to submit new versions of
Most of the changes to
The definition of
| |
How About A Little Extra Credit? | |
Consider the following definitions: As the system is specified at this point, if(defun (add x y) (+ x y)) (defun (make_adder n) (add n)) make_adder was ever applied to an argument,
it would generate a runtime error stating that add was applied to too few arguments.
It would be nice, however, to recognize that we really wish to build a partial application, and,
at that point, curry the definiton of Similarly, once we support this change, it would be reasonable to assume that application associates to the left as in ML. Thus the interpreter should view the following as equivalent: Thus it should be syntactically acceptable to apply a function to more arguments than it is defined to take, under the assumption that after using its defined number of arguments the function will return a function which will then be applied to the rest. This could, of course, lead to a runtime error if the function does not return some sort of function.(define seven ((make_adder 3) 4)) (define seven (make_adder 3 4))
Of course, there is no reason to restrict this treatment to user defined functions.
It is just as reasonable to define without the intermediate definition of(defun (make_adder n) (+ n)) add.For extra-credit, rewrite the evaluator so that this automated currying/uncurrying occurs. Note that, in order to make the syntax well-behaved, it is necessary to assume that certain "misuses" of a function are still wrong. In particular, applying a function of non-zero-arity to no arguments is wrong (since it does not accomplish anything), and applying a zero-arity function to arguments is also bad (since it makes the syntax for function application somewhat ambiguous). Both of these should generate runtime errors. (Recall, this is why ML does not have zero-arity functions, and instead has unit; it avoids this ambiguity.)
| |
This page copyright ©1998 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Fri, Apr 10, 1999 at 3:21:15 PM. | |
http://cs.hmc.edu/~hodas/courses/cs131/projects/project08.html | |