------------------------

Harvey Mudd College
Computer Science 131
Programming Languages
Fall Semester 1999

Lecture 02

------------------------

------------------------

Compilers and Interpreters

Just as most of you have, up till now, done most of your programming in imperative languages, I suspect that most of you have worked exclusively with compiler-based languages.

------------------------

A compiler is sometimes called a transformer because what it does is transform a program in one language to a program in another language such that the second program has the same ``meaning'' as the first. We are not any closer to that ``meaning'', we won't have that till the second program is actually executed. The purpose of the compiler is just to get us a program that can be executed so that we can eventually get the meaning.

The typical mode of working with a compiler is that you edit your program in an editor, then you run your compiler on it, then you run the program that results from the compilation process. Then, if there are bugs, you start the cycle over. Note that the compiler has no meaningful interaction with the user.

------------------------

An interpreter is a very different beast. Imagine that when you wanted to work in C that you ran GCC and it just gave you a prompt. Now at that rompt you could type in function definitions, in which case they would be checked for correct syntax and stored away if they were ok.

But suppose you could also just ype in C expressions and the system would execute them directly. So, you could type sqrt(9); and the system would tell you 3. Thats how an interpreter behaves.

An interpreter takes a program and directly computes its meaning (some value). There is no second object program to execute. Interpreters typically interact heavily with the user. While a program may be written in an editor and loaded into the interpreter for execution, it can also be typed directly into the interpreter. As expressions are typed they are evaluated and their values printed for the user.

This leads to what is known as the read-eval-print loop.

When you enter a function definition the interpreter recognizes that and simply stores it for later use.

------------------------

Now one of the downsides of interpreters, every time they use a function, they have to go through the process of ``interpreting'' its meaning relative to the arguments it is passed. In general this is slower than executing the equivalent compiled code.

The result is that nowadays many interpretes are not really interpreters, butwhat are called ``incremental compilers''. The idea is that when you type an expression at the prompt, the system doesn't interpret it, but rather compiles it and then runs the compiled code and then presents you the result.

If you enter a function definition the system just stores the compiled code for the function as its definition. The ML system we will be using is and incremental compiler.

------------------------

A Brief History of ML

I want to talk now for a couple of minutes about the history of ML. You don't really need to know this but I think that understanding the background behind a language is kind of important because it influences how the language comes to look and be the way it does.

ML as originally developed in the late 1970's at the University of Edinburgh in Scotland by Mike Gordon and a handful of collaborators.

Interestingly, ML wasn't originally intended by it's designers as a general-purpose stand-alone language. Rather, Gordon was working on a theorem proving package called LCF (The Logic of Computable Functions) that was designed to prove facts about programs written in a relatively simple language called PCF.

Now theorem proving in that setting is complicated and theorem provers like LCF don't just go off and prove things on their own. You have to give them a lot of hints, which are called tactics, about what to doin various situations.

ML was originally just the custom language used to program the tactics for PCF. Since it was a language in which the user was going to make statements about another object language, Mike Gordon just called it Meta Language. This was inevitably shortened to ML.

Now other people working on the PCF project and related things began to realize that ML was a really nice programming language in its own right. And so it has come to stand alone from the PCF project.

Eventually there were several variants of ML in development and use at a variety of institutions. The inevitable incompatibilties developed and so in the mid 80's a series of discussions were held culminatingin the standardization of language. The new language is called SML. Some books say that this stands for ``Standard Meta Language'' but that's a misnomer. ML used to stand for ``Meta Language'' but SML stands for ``Standard ML''.

Most ML's now conform pretty closely to the standard, though there are often a number of non-standard extensions. One ML that continues to develop apart from the standard and yet has a fairly large following because of some very nice features it includes is CAML which stands for Category Architecture ML. It was developed in France by the people at INRIA (the natonal computer science research institutes) in Paris and Nice and grew out of theoretical work in using a mathematical system called Category Theory as the underlying model of computation.

The ML (I should say SML but I get lazy) we will be using is SML-NJ (Standard ML of New Jersey) which is under continuing development by researchers in the Languages and Applied Logic group at Bell Labs (Lucent) together with researchers at Princeton University and the University of Pennsylvania.

------------------------

Introduction to Standard ML

Now let's start looking at SML for real.

To start SML, you just type sml

As I described the nature of the read-eval-print loop is such that you can just type an expression and SML will respond with its value. For instance,

3;
3 + 2;

Notice that when SML responded with the value of the expression it also told me the value's type. Types are an extremely important aspect of SML (the feature which most distinguishes it from languages in the LISP family like Scheme and Common Lisp).

I'll talk about what the val it = ... stuff means in a minute.

------------------------

There are two numeric base types, real and int. The usual operators, +, -, and * are defined for the integers and for the reals (but not with a mixed pair of an integer and real as we just saw). Integer division is done with the infix operator div, while / is used for real division.

Note that - is used only for binary subtraction. The unary negation operator is written '~'. The full complement of relational operators, =, <, >, <=, >=, and <> are supported as well. Each takes either a pair of integers or a pair of reals and returns a boolean value; which brings us to the next base type, bool.

The booleans are a built in type with two values: false, and true. Two booleans may be checked for equality (and inequality), but no other comparisons or operations are allowed.

------------------------

There are lots of built in functions in ML and function application is pretty much like other languages, except you'll notice that you don't in general use parentheses. So for instance:

Math.sqrt 4.0;

You only use parentheses for grouping and precedence purposes, like:

Math.sqrt (4.0 + 12.0);

Notice that functions are themselves just named values:

Math.sqrt;

op +;

Real.~;

------------------------

Now, ML is telling us by the arrow that sqrt is a function which takes a real and returns a real. As I said before, SML takes its types seriously. and if you violate them it will slap you on the wrist:

Math.sqrt 3;

A few functions like plus are overloaded so they work on different types

3 + 4;
3.0 + 4.0;

But there are limits to this, so if we type:

3 + 4.0;

we'll get an error. In general SML does not support defining your own overloaded operators, though SML-NJ does provide hooks for doing it. Later we'll talk about a much richer concept than overloading called Polymorphism that allows appropriate functions to work on many different types.

------------------------

Now, to assign a name to a value you just use:

val x = 3+4;

notice the system echos back the assignment with the final value filled in. Whenever you type an expression that is not an assignment the system alutomatically assigns it the name it so you can use the result in subsequent expresions:

3 + 4;
it + 5;

Notice that the result of the second expression now has the name it, so we have lost the previous result.

------------------------

The last base types are string and char. String constants are written with double-quotes as in C. All the relational operators described above are defined for strings. You may also concatenate two strings using the ^ operator as in:

val s = "Hello " ^ "World!";

Character literals are written in a slightly odd notation, as singleton strings preceeded by a '#' character. I.e.

val c = #"h";

As above, the usual relational operators are supported

SML-NJ supports the usual C-like mechanism of using the backslash to write the standard control codes, such as \n for newline and \" for the double-quote character. These may be used in characters and strings.

The function explode takes a string and returns a list of characters (more on lists later) and the function implode does the opposite.

------------------------

Actually, I lied, those aren't all the base types. The last is an odd type called unit, that is even simpler than the booleans. It has only one value, (), which is also called unit.

Why does it exist?

------------------------

Now, ML has three built-in structured types, tuples, records, and lists.

Tuples are just unlabeled ordered pairs, triples, etc. So for instance, we can have a pair of integers (2,3), a triple of reals (2.3,3.4,5.6), or a quadruple of strings ("this","is","a","test").

Tuples are heterogeneous, which means that each field of the tuple can contain a different type. So, we can have a quadruple like:

val bigtup = (1,true,(),"test",(2,"hello","world"),3.14159)

Two tuples can be tested for equality and inequality, and they are equal if each field is equal.

You can select out the fields of a tuple using numeric selectors:

#2 bigtup;

#5 bigtup;

#2 (#5 bigtup);

As it turns out, though, These selectors are rarely used because of the availability of pattern matching, that I'll explain in a bit.

------------------------

Records are similar to tuples but use field names rather than position to distinguish the different fields. So for example:

val emprec1 = {name="Josh",ext=8650};

Fields are selected similarly to tuples:

#ext emprec1

Records can also be compared for equality and inequality. This is done fieldwise, by the name of the field, not position:

val emprec2 = {name="Ran",ext=8976};

val emprec3 = {ext=8650,name="Josh"};

emprec1 = emprec2;

emprec1 = emprec3;

------------------------

Lists are homogeneous variable length structures. A list has a head and a tail. The head is a single item of some type. The tail is a list of that type. There is a special element named nil that is used to terminate a list. Nil is a list of arbitrary type.

The basic notation for lists uses the infix constructor :: which is pronounced `cons' for `construct'.

You would write a list of integers like:

val ilist1 = (1::2::3::nil);

No internal parentheses are needed here because cons associates to the right. So the last expression is equivalent to

(1::(2::(3::nil)));

notice that cons takes an element on the left and a list on the right. So we can build up a list out of existing ones like:

val ilist2 = (4::ilist1);

Remember, though, that lists must be homogeneous, so:

val badlist = (1::2::3.0::4::nil);

will generate an error.

------------------------

There is a more comact notation for lists that can be used when you can enumerate all the elements of the list. Just use a pair of square braces with the elements separated by commas. In fact, notice that this is how ML echoed our lists back to us before. The two notations are entirely interchangeable. In the latter notation, nil is written [].

Now I said before that each element of the list is a single item. But these items can be of any first class type in the system. We can build lists of ints as before, as well as lists of pairs of ints and strings as in:

val wierd1 = [(1,"hello"),(2,"goodbye")];

Similarly we can build lists of lists:

val wierd2 = [[1,2],[3,4,5],[6]];

We can even build lists of functions:

val wierd3 = [op *,op div]

val wierd3a = [op *,op /]

You will need to get pretty good at reading ML types and understanding them. For instance, notice the difference between the types of:

wierd1;

and

val wierd5 = ([1,2],["hello","goodbye"]);

------------------------

As with tuples and records, lists come with a set of little used selectors, hd which gives you the head of a list, and tl which gives you the tail of a list:

wierd2;

hd wierd2;

tl wierd2;

hd (tl wierd2);

hd (tl (hd (tl wierd2)));

(hd wierd3) (2,3);

------------------------

This page copyright ©1999 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Tuesday, August 31, 1999 at 3:09:10 PM.
http://cs.hmc.edu/~hodas/courses/cs131/lectures/lecture02.html