------------------------

Harvey Mudd College
Computer Science 131
Programming Languages
Spring Semester 1999

Lecture 07 (2/10/99)

------------------------

------------------------

Even just the part of the ML type system shown so far is a pretty big win for the programmer. ML's real strength, though, is in the fact that the system of type inference works not just for the base types that are built in, but extends naturally to types that you are allowed to define yourself.

This is where the rubber hits the road in ML!

User defined types can be categorized pretty much as follows:

The last two are actually defined using the same mechanisms, one is just a special case of the other.

You can also categorize the user defined types by distinguishing between:

------------------------
The simplest user-defined type is the type identity, which is are defined with the type declaration. Type identities are just a tool to allow you to give some of the types you use names that are more appropriate to your individual application. They don't create any new types, they just give new names to existing types.

For example, suppose you are writing a personnel database application, and you are storing names and social security numbers both as strings. It may be handy for you to annotate your functions with more meaningful types than just strings.

To do this you feed ML the following type identity declarations:

type name = string;
type ssn = string;
type database = (name * ssn) list;

Now we can write the function that looks up a person's name based on their social security number:

fun lookup socnum (nil:database) = NONE
  | lookup socnum ((nm,sn)::tail) = 
       if (socnum = sn)
         then SOME nm
         else lookup socnum tail;

So we see that ML will attempt to report it's function types in terms of the information we've given it.

This will not always be entirely successful due to quirks in the type inference algotrithm, and I will admit that I'm not entirely sure why the type of this function is not ssn -> database -> name option.

In general, type identities will be most useful when you are designating types and want ML to confirm them, rather than when you are infering types.

------------------------

New types in ML are defined with the datatype declaration.

The simplest kind of datatype is an enumerated type in which you supply the full range of possible values. For example, we could define:

datatype day_of_week = 
             Monday   | Tuesday | Wednesday 
           | Thursday | Friday  | Saturday  | Sunday;
What SML responds is that you have defined a new type as well as seven new constructors for that type. In this case, the constructors don't take any arguments.

Note that SML-NJ echos them back to you in alphabetical order, not the order you declared them in. This is important. Unlike enumerated types in Pascal or C, ML's enumerated types are not assumed to be ordered in any particular way. If you want to define the ord and relational functions to work on them, that's fine, but you'll have to define them manually. The only thing you get for free is equality checking.

Now, we can define functions in terms of this type. For instance:

fun workday Saturday = false
  | workday Sunday = false
  | workday _ = true;

val weekend = not o workday;

While these are similar to the enumerated types of C and Pascal there are some important differences. Unlike C, you should not be suprised to discover, user-defined enumerations are not just a set of aliases for small integers. For example, in C++ you can write the following program:

enum Boolean {False, True};

main()
{
  Boolean x = True;
  cout << x << end;

x = (Boolean) 3; cout << x << end; }

and the program will print out 1 and 3. If you leave out the cast in the assignment, then you will get a warning, but nothing else. Thus the enumerated types in C++ are little more than a shorthand for defining a set of integer constants. In ML there is no apparent connection between an enumerated type and any base type.

The second key difference is that we can't define what Pascal calls range types. I.e. the following is illegal:

datatype weekday = Monday..Friday;
Why not? It's not just that there is no implied ordering on the day_of_week type. We also can't do it on the integers (which do have an ordering) as in:
datatype small = 0..255;
Why not? Because, in order for ML to infer types, every value must belong to only one type. There has been a great deal of work over the last 10 years in extending the type inference algorithm to handle such systems of subtypes, and it is likely that the follow on to ML (now being developed by a joint committee called ML 2000) will feature some such extensions.

As one more example of an enumerated type, we could define a new kind of boolean that allows for ambiguity in our reasoning:

datatype fuzzybool = yes | no | maybe;

fun And (yes,yes) = yes
  | And (yes,no) = no
  | And (yes,maybe) = maybe
  | And (no,yes) = no
  | And (no,no) = no
  | And (no,maybe) = no 
  | And (maybe,yes) = maybe
  | And (maybe,no) = no
  | And (maybe,maybe) = maybe;

infix And;

At this point you should realize that having types like the booleans and unit built in is just a convenience. They could all be defined by the user.

------------------------

Now while enumerated types are handy, they are a simple addition. Structured types are the more interesting kind. In a structured type, constructors are not just constants, rather they expect arguments.

So, for instance, we could declare a new type for complex numbers built out of two reals as follows:

datatype complex_number = 
           complex of real * real;
Now SML tells us that we hae defined a new type with a single constructor. The constructor is kind of like a function, but it doesn't do anything other than a type translation. By putting the constructor complex in front of a pair of reals, that pair becomes a complex_number.

Now I can write some functions that work on complex numbers:

fun im (complex (_,imag)) = imag;
fun re (complex (real,_)) = real;

fun add (complex (real1,imag1))   
        (complex (real2,imag2)) = 
      complex (real1+real2,imag1+imag2);

With constructed types the ambiguities that arose in type identities do not exist, so ML will always infer the right types for functions.

------------------------

One important use of constructed types is the creation of what are called labelled unions. Suppose that we want to build some part of a system so that it can accept ints, reals, or strings. For instance, when we are building our interpreter we will be trying to handle an untyped language, which means that we can freely intermingle types in an expression (possibly at the expense of an eventual run time error).

So far we have no way of combining elements of different type in the same context. But we can use type constructors to accomplish it. We will define one meta-type called value that will be able to hold the different types we want to be able to combine. We will accomplish this by giving value three different constructors, one for each of the underlying types we want to handle. That is:

datatype value = Int of int | Real of real 
               | String of string;
Without labels we could never combine types, because, given an element of one of the underlying types, say the number 3, the system wouldn't know if we meant to be treating it as a member of the underlying type, int, or as a value. This is the same reason we couldn't declare range types earlier.

With the labels there is no such ambiguity. Now we can write a function to add two values:

fun add (Int i1)    (Int i2)    = Int (i1 + i2)
  | add (Int i)     (Real r)    = Real ((real i) + r)
  | add (Int i)     (String s)  = String ((Int.toString i) ^ s)
  | add (Real r)    (Int i)     = Real ((real i) + r)
  | add (Real r1)   (Real r2)   = Real (r1 + r2)
  | add (Real r)    (String s)  = String ((Real.toString r) ^ s)
  | add (String s)  (Int i)     = String (s ^ (Int.toString i))
  | add (String s)  (Real r)    = String (s ^ (Real.toString r))
  | add (String s1) (String s2) = String (s1 ^ s2);

------------------------

Suppose we want to define something like the option types. We can define an integer option as in:

datatype intoption = NONE | SOME of int;

We could go on from here and define a real option type and a string option type and so on, but it would be very putzy and somehow go against the grain of polymorphism. Instead, SML allows you to define polymorphic structured types:

datatype 'a option = NONE | SOME of 'a;

------------------------

Perhaps the most important kind of user defined type that can be declared is the recursive type. These are used for building dynamic data structures like trees and lists.

As a simple example, Let's suppose that the lists had not been provided as a built in type. We could define it ourselves by providing a base case for the type, the empty list, as well as a recursive structure.

First, let's look at defining a type just for integer lists:

datatype intlist = empty 
                 | icons of int * intlist;
Now, we can write any of the traditional list functions to work over this sort of integer list:
fun imemb _ empty = false
  | imemb i (icons (h,t)) = if (i = h)
                              then true
                              else imemb i t;

As above, we can extend the idea to polymorphic lists:

datatype 'a lst = empty 
                | cons of 'a * 'a lst;

fun memb _ empty = false
  | memb i (cons (h,t)) = if (i = h)
                            then true
                            else memb i t;

We can make it feel more like the built-in list type by using an infix constructor:

infix :::;

datatype 'a lst = nl | ::: of 'a * 'a lst;

fun memb _ nl = false
  | memb e (h:::t) = if (e = h) 
                       then true
                       else memb e t;
------------------------

Next I want to talk about one of the ways that you can define a type and restrict access to the internals of the type so that it is truely an abstract data type.

The first mechanism that the designers of ML incorporated for this purpose is called the abstype declaration. It is somewhat similar to the notion of object definition in C++ in that it limits access to the constructors and destructors of a type.

This mechanism is not so frequently used today because the alternative, modules, which we will talk about next class, offers many advantages. It is, nevertheless, important to cover because the module system, even though it is part of the standard, is not considered part of the "core" language, and you may find implementations that don't support it.

Suppose that we wish to implement a data type to represent the natural numbers (as distinct from the integers).

One option is to build them on top of the integers as follows:

(* An implementation of natural numbers using integers *)
 
datatype nat = Nat of int;
 
exception Pred;
 
val zero = Nat 0;
 
fun iszero n = (n = zero);

fun succ (Nat n) = Nat (n+1);
 
fun pred (Nat 0) = raise Pred
  | pred (Nat n) = Nat (n - 1);
 
fun add_nat (Nat i) (Nat j) = Nat (i+j);
 
fun mult_nat (Nat i) (Nat j) = Nat (i * j);

fun nat_to_int (Nat n) = n;

Another option is to build a representation based on lists of the appropriate length. The contents of the list don't matter, so we'll use unit, only the length matters.

(* An implementation of natural numbers using unit lists *)
 
datatype nat = Nat of unit list;
 
exception Pred;
 
val zero = Nat [];
 
fun iszero n = (n = zero);

fun succ (Nat n) = Nat (()::n);
 
fun pred (Nat []) = raise Pred
  | pred (Nat (()::prd)) = Nat (prd);
 
fun add_nat (Nat []) (Nat j) = Nat j
  | add_nat i j = succ (add_nat (pred i) j);
 
fun mult_nat (Nat []) (Nat j) = zero
  | mult_nat i j = add_nat j (mult_nat (pred i) j);

fun nat_to_int (Nat n) = length n;

This will work equally well, just more slowly.

The problem is, that if I give you one of these implementations, you can use it, but you can also directly access the internals of any number you build and play with it yourself. This goes against proper notions of modularity. The solution is the abstype.

An abstype declaration looks much like a datatype declaration, but exports no constructors, so values of that type cannot be manipulated by other functions. THe only functions allowed to manipulate the term structure directly are a priviledged set given inside the definition. These two implementations can be turned into abstype definitions as follows:

(* An abstract implementation of natural numbers using integers *)
 
abstype nat = Nat of int
with
   exception Pred;
 
   val zero = Nat 0;
 
   fun iszero n = (n = zero);

   fun succ (Nat n) = Nat (n+1);
 
   fun pred (Nat 0) = raise Pred
     | pred (Nat n) = Nat (n - 1);
 
   fun add_nat (Nat i) (Nat j) = Nat (i+j);
 
   fun mult_nat (Nat i) (Nat j) = Nat (i * j);

   fun nat_to_int (Nat n) = n;
end;

(* An abstract implementation of natural numbers using unit lists *)
 
abstype nat = Nat of unit list
with
   exception Pred;
 
   val zero = Nat [];
 
   fun iszero n = (n = zero);

   fun succ (Nat n) = Nat (()::n);
 
   fun pred (Nat []) = raise Pred
     | pred (Nat (()::prd)) = Nat (prd);
 
   fun add_nat (Nat []) (Nat j) = Nat j
     | add_nat i j = succ (add_nat (pred i) j);
 
   fun mult_nat (Nat []) (Nat j) = zero
     | mult_nat i j = add_nat j (mult_nat (pred i) j);

   fun nat_to_int (Nat n) = length n;
end;

One important point about abstracted types is that they are never equality types. Since it might be possible to build a version of a given abstract data type using an underlying type that is not an equality type, the system cannot allow a situation where one implementation might be an equality type, while another might not be. Therefore, the least-common-denominator is assumed.

You may of course provide an equality operator for your type (and even overload the normal = operator to call it), but that is up to you as programmer.

------------------------

This page copyright ©1999 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Wednesday, February 10, 1999 at 1:45:10 PM.
http://cs.hmc.edu/~hodas/courses/cs131/lectures/lecture07.html