|
|
|
Purpose: |
|
|
|
In this project you will extend and modify the dynamically-scoped mini-Lisp built in the last project so that it becomes a lexically scoped mini-Scheme interperter with first-class functions. This will involve adding storage structures for functions as data and modifying the way in which s-expressions are stored. Most importantly, it will involve implementing types for environments and closures that behave in the desired way. |
Background: |
|
|
|
|
|
Lexical Scoping |
A major difference between this interpreter and the last
is its use of lexical scope. This requires a deeper
interleaving of environments (i.e. lists of identifier/value
binding pairs) into other structures and routines. For
example, it should now be possible to catch all references
to undeclared identifiers at "compile" (i.e.,
syntax-conversion) time. But that means that the routines in
Don't Panic!!! You don't have to write this yourself. I'm
once again giving you a functioning
Now, as we have discussed in class, in lexically-scoped languages, when you store a function definition you must include a pointer to the top of the environment active at the point of definition. When the function is applied to some arguments, the bindings of the formal parameters are added to this stored environment (rather than the current environment) to produce the environment in which the function's body is evaluated. If this were the only change we were making to the interpreter then it would simply involve adding a slot for an environment to the structure used to hold function definitions in the function definition list. Of course the stored environment would need to list both the variable and the function bindings. |
|
First- Class Functions |
However, the other task for this project is to make functions first-class data objects, which will require additional, related changes. In one respect it will simplify the interpreter: there will no longer be separate environments (name spaces) for functions and variables. Instead there will only be variables, some of which will have functions as their values. |
|
Unnamed Functions |
There will actually be two kinds of functions in this
system: named functions defined in the traditional way with
Of course, it is possible to bind a variable name to an unnamed function by writing:
but, as we will see below, this sort of named function
will not allow recursion, whereas those defined with
In order to support the
As described in class, the result evaluating a
Now, if we were not worried about recursion, then we
would let SML handle the details of creating the pointer to
the current environment for us, by just building an ordinary
data structure in this case for
| Cclosure of vname list * cexpression (* Formal params & body *)
* (vname * datavalue) list (* Current environment *)
The first two fields would store the information from the
|
|
Named Functions |
However, you will recall from class that in order to
support recursion, the closure constructed by a
This requires being able to build a pointer structure with a loop in it (The environment pointed to by the current environment pointer contains a closure which contains a pointer to the current environment.). Unfortunately (or, perhaps, fortunately, since it avoids alot of potential errors when such loops are unintended), you can't produce such loops with ordinary SML data structures. However, a loop can be created with
| Cclosure of vname list * cexpression (* Formal params & body *)
* (vname * datavalue) list ref (* Current environment *)
In the case of a evaluating a |
|
Tying The Knot |
While I won't tell you exactly what to do to produce the loop we need, I'll give you an example of the general idea. Suppose we define the following data type:
We can construct some simple values of this type as in:
Of course, if this is all we were going to do, there
would be no reason to have put the
While the value of
The trick to creating the loop is to set the
The bit with the percent sign is SML's (rather smart) way
of avoiding going into a loop printing the value of
Paulson, the author of ML for the Working Programmer, refers to this process as tying the knot. Thus, when you build the Cclosure for a defun, you first put a dummy environment at the end of the ref in the closure, then eventually replace that value with the overall environment that includes the binding of the function name to the closure, tying the knot. |
|
Primitive Operators |
Actually, I lied. There aren't two kinds of functions in
this system, there are three. The last kind is used for the
built-in operations of the language. To make functions fully
first-class, the built-ins need to be first class too.
(Well, almost first class: there will not be any way to
create a new one within the language.) Scheme built-in
functions are refered to as primitive operators, or
prim-ops for short. There is no syntax for prim-ops
since, as we said, you can't create new ones. As data values
they will be stored using the following case of the new
where the integer is the arity of the operator. The name of the operator is used by applyPrimOp (the replacement for applyBuiltIn) as usual to determine what action to take. Because of this new treatment of the built-in functions, you will end up stripping away about half the code in your evaluator functor. (While this is the solution we have chosen for implementing the primitive operators, because it is conceptually simple, a more interesting solution that would make use of SML's higher-order features would be to define the datatype case for a prim-op as:
That is, we would store an actual ML function for the
prim-op. That function would be one which does the work of
computing the value of that prim-op on the arguments it is
given. Then we would do away with the function
|
|
Data Values |
We have alluded to the new type for data values several times already. When we first started this project many weeks ago, the data values were numbers. Then we graduated to numbers and strings, and then to s-expressions. The important thing all these types had in common was that they were subtypes of the types already being used in the concrete syntax of the language. That is, they were all part of the underlying syntax. A number as typed by the user (or at least as returned by the parser) is the same as a number as stored in a variable. Now, however, we have functions as data, and their
evaluated values are different from their syntactic
representation (closures as opposed to functions). So our
existing types are inadequate. Of course, we still need to
be able to build pairs (i.e. s-expressions), but at the
leaves of the s-expressions (i.e. where the atoms used to
be) are values that might be numbers, strings, or symbols
(i.e. anything in So, the new type definition for
(Note that the leading Whereas, during syntax translation in the last project, a
quoted object just got wrapped in the |
|
The last major effect of allowing functions to be
first-class values is that, since they may now be returned
as the results of evaluation, the term in head-position in
an application no longer needs to, necessarily, be a named
function (built-in or user-defined). Instead it may be an
arbitrary expression. Therefore, the case for applications
in
Evaluating an application is much like before, except that instead of just looking up the definition of the named function in the function list (assuming it wasn't a built-in), you take the term in head-position and evaluate it like any other expression. The result of that evaluation is then applied to the evaluated arguments. Of course, that means the term in the head can't really be any arbitrary expression. It has to evaluate down to one of the two kinds of "functions": a prim-op, or a closure. If not, a runtime error occurs. To reiterate, application is now a three step proccess:
The last step is best handled by defining an auxilliary
|
|
|
So that you don't have to include all the code for
evaluating the primitive operators, which is pretty much
unchanged from the old code for built-ins, the project has
been restructured around a new functor called
The values The function Note that the runtime arity checking is necessary because it cannot always easily be checked at syntax-conversion time, since you might pass a prim-op as an argument to another function in which it is used. For example, you might write:
Then you could write:
as well as:
Without types to help us, there is no easy way to determine at syntax conversion (compile) time that the second definition should be rejected. Instead we will leave it till run-time to be discovered. (Obviously, the same runtime arity-check will need to be done for closure application as well.) (In fact, while it is still possible at this point to do some arity checking at syntax-conversion time, the syntax converter for this project does not do any. The reason is that the extra-credit portion of this project, described below, involves a change that effectively means there cannot be compile-time arity errors. Rather than produce two separate syntax converters for this project, it was decided to just do one which does not do any arity checking.) |
|
|
The Initial Environment |
The value There is one exception to the
no-special-treatment-for-buit-ins rule that you need to
consider. If you look at the definition of
By the way, in Scheme it was standard to name predicate
(truth-valued) functions with a trailing question mark.
(Some versions of lisp used the name |
What I Have Provided: |
|
|
|
I have created a version of the SML-NJ interpreter with
the parser structure The executable (or a link to it) is in
You may find it useful to diff the new version of syntaxconverter.sml against the one from the last project to get a sense of the changes. (I have also loaded a sample solution to the evaluator functor and built a structure called testEvaluator as well. You can use the functions in this structure from the SML command line to see what the results of evaluation should be.) I have also provided an executable sample solution at
The file Finally, the file /cs/cs131/proj_8/functorHeaders.sml contains the functor headers as you should use them in your code. |
Your Job (Whether Or Not You Choose To Accept It): |
|
|
|
After all that, there's not much to actually say here.
You are to submit new versions of
Most of the changes to The definition of |
How About A Little Extra Credit? |
|
|
|
Consider the following definitions:
As the system is specified at this point, if
It would be nice, however, to recognize that we really
wish to build a partial application, and, at that point,
curry the definiton of Similarly, once we support this change, it would be reasonable to assume that application associates to the left as in ML. Thus the interpreter should view the following as equivalent:
Thus it should be syntactically acceptable to apply a function to more arguments than it is defined to take, under the assumption that after using its defined number of arguments the function will return a function which will then be applied to the rest. This could, of course, lead to a runtime error if the function does not return some sort of function. Of course, there is no reason to restrict this treatment
to user defined functions. It is just as reasonable to
define
without the intermediate definition of
For extra-credit, rewrite the evaluator so that this automated currying/uncurrying occurs. Note that, in order to make the syntax well-behaved, it is necessary to assume that certain "misuses" of a function are still wrong. In particular, applying a function of non-zero-arity to no arguments is wrong (since it does not accomplish anything), and applying a zero-arity function to arguments is also bad (since it makes the syntax for function application somewhat ambiguous). Both of these should generate runtime errors. (Recall, this is why ML does not have zero-arity functions, and instead has unit; it avoids this ambiguity.) |
|
|
This page copyright ©2000 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Sunday, Apr 9, 2000. |
http://www.cs.hmc.edu/~hodas/courses/cs131/projects/project08.html |
|