Lecture 06

Harvey Mudd College
Computer Science 131
Programming Languages
Spring Semester 2000

Lecture 06

Continuing Introduction to ML
(User-Defined-Types: TYPEFEST 2000)

User defined types can be categorized pretty much as follows:

Type Identities
Enumerated Types
Constructed Types

The last two are actually defined using the same mechanisms, one is just a special case of the other.

You can also categorize the user defined types by distinguishing between:

Types with accesible internal structure
Abstract Types

The simplest user-defined type is the type identity, which is are defined with the type declaration.

For example, suppose you are writing a personnel database application, and you are storing names and social security numbers both as strings. It may be handy for you to annotate your functions with more meaningful types than just strings.

To do this you feed ML the following type identity declarations:

type name = string;
type ssn = string;
type database = (name * ssn) list;

Now we can write the function that looks up a person's name based on their social security number:

fun lookup socnum (nil:database) = NONE
  | lookup socnum ((nm,sn)::tail) = 
       if (socnum = sn)
         then SOME nm
         else lookup socnum tail;

ML will attempt to report it's function types in terms of the information we've given it.

In general, though, type identities will be most useful when you are designating types and want ML to confirm them, rather than when you are infering types.

New types in ML are defined with the datatype declaration.

The simplest kind of datatype is an enumerated type in which you supply the full range of possible values.

For example:

datatype day_of_week = 
             Monday   | Tuesday | Wednesday 
           | Thursday | Friday  | Saturday  | Sunday;

Now, we can define functions in terms of this type. For instance:

fun workday Saturday = false
  | workday Sunday = false
  | workday _ = true;

val weekend = not o workday;

While these are similar to the enumerated types of C and Pascal there are some important differences. Unlike C, you should not be suprised to discover, user-defined enumerations are not just a set of aliases for small integers. For example, in C++ you can write the following program:

enum Boolean {False, True};

main()
{
  Boolean x = True;
  cout << x << end;

  x = (Boolean) 3;
  cout << x << end;
}

and the program will print out 1 and 3. If you leave out the cast in the assignment, then you will get a warning, but nothing else. In ML there is no connection between an enumerated type and any base type.

The second key difference is that we can't define what Pascal calls range types. I.e. the following is illegal:

datatype weekday = Monday..Friday;

Why not? It's not just that there is no implied ordering on the day_of_week type. We also can't do it on the integers (which do have an ordering) as in:

datatype small = 0..255;

Why not?

Another example:

datatype fuzzybool = yes | no | maybe;

fun And (yes,yes) = yes
  | And (yes,no) = no
  | And (yes,maybe) = maybe
  | And (no,yes) = no
  | And (no,no) = no
  | And (no,maybe) = no 
  | And (maybe,yes) = maybe
  | And (maybe,no) = no
  | And (maybe,maybe) = maybe;

infix And;

In a structured type, constructors are not just constants, rather they expect arguments.

So, for instance, we could declare a new type for complex numbers built out of two reals as follows:

datatype complex_number = 
           complex of real * real;

The constructor is kind of like a function, but it doesn't do anything other than a type translation. By putting the constructor complex in front of a pair of reals, that pair becomes a complex_number.

Now I can write some functions that work on complex numbers:

fun im (complex (_,imag)) = imag;
fun re (complex (real,_)) = real;

fun add (complex (real1,imag1))   
        (complex (real2,imag2)) = 
      complex (real1+real2,imag1+imag2);

One important use of constructed types is the creation of what are called labelled unions. For example:

datatype value = Int of int | Real of real 
               | String of string;

fun add (Int i1)    (Int i2)    = Int (i1 + i2)
  | add (Int i)     (Real r)    = Real ((real i) + r)
  | add (Int i)     (String s)  = String ((Int.toString i) ^ s)
  | add (Real r)    (Int i)     = Real ((real i) + r)
  | add (Real r1)   (Real r2)   = Real (r1 + r2)
  | add (Real r)    (String s)  = String ((Real.toString r) ^ s)
  | add (String s)  (Int i)     = String (s ^ (Int.toString i))
  | add (String s)  (Real r)    = String (s ^ (Real.toString r))
  | add (String s1) (String s2) = String (s1 ^ s2);

Suppose we want to define something like the option types. We can define an integer option as in:

datatype intoption = NONE | SOME of int;

We could go on from here and define a real option type and a string option type and so on, but it would be very putzy and somehow go against the grain of polymorphism. Instead, SML allows you to define polymorphic structured types:

datatype 'a option = NONE | SOME of 'a;

Perhaps the most important kind of user defined type that can be declared is the recursive type. These are used for building dynamic data structures like trees and lists.

First, let's look at defining a type just for integer lists:

datatype intlist = empty 
                 | icons of int * intlist;

Now, we can write any of the traditional list functions to work over this sort of integer list:

fun imemb _ empty = false
  | imemb i (icons (h,t)) = if (i = h)
                              then true
                              else imemb i t;

As above, we can extend the idea to polymorphic lists:

datatype 'a lst = empty 
                | cons of 'a * 'a lst;

fun memb _ empty = false
  | memb i (cons (h,t)) = if (i = h)
                            then true
                            else memb i t;

We can make it feel more like the built-in list type by using an infix constructor:

infix :::;

datatype 'a lst = nl | ::: of 'a * 'a lst;

fun memb _ nl = false
  | memb e (h:::t) = if (e = h) 
                       then true
                       else memb e t;

Suppose that we wish to implement a data type to represent the natural numbers (as distinct from the integers).

One option is to build them on top of the integers as follows:

(* An implementation of natural numbers using integers *)
 
datatype nat = Nat of int;
 
exception Pred;
 
val zero = Nat 0;
 
fun isZero n = (n = zero);

fun succ (Nat n) = Nat (n+1);
 
fun pred (Nat 0) = raise Pred
  | pred (Nat n) = Nat (n - 1);
 
fun addNat (Nat i) (Nat j) = Nat (i+j);
 
fun multNat (Nat i) (Nat j) = Nat (i * j);

fun natToInt (Nat n) = n;

Another option is to build a representation based on lists of the appropriate length. The contents of the list don't matter, so we'll use unit, only the length matters.

(* An implementation of natural numbers using unit lists *)
 
datatype nat = Nat of unit list;
 
exception Pred;
 
val zero = Nat [];
 
fun isZero n = (n = zero);

fun succ (Nat n) = Nat (()::n);
 
fun pred (Nat []) = raise Pred
  | pred (Nat (()::prd)) = Nat (prd);
 
fun addNat (Nat []) (Nat j) = Nat j
  | addNat i j = succ (addNat (pred i) j);
 
fun multNat (Nat []) (Nat j) = zero
  | multNat i j = addNat j (multNat (pred i) j);

fun natToInt (Nat n) = length n;

This will work equally well, just more slowly.

The problem is, that if I give you one of these implementations, you can use it, but you can also directly access the internals of any number you build and play with it yourself. This goes against proper notions of modularity. The solution is the abstype.

An abstype declaration looks much like a datatype declaration, but exports no constructors, so values of that type cannot be manipulated by other functions. The only functions allowed to manipulate the term structure directly are a priviledged set given inside the definition. These two implementations can be turned into abstype definitions as follows:

(* An abstract implementation of natural numbers using integers *)
 
abstype nat = Nat of int
with
   exception Pred;
 
   val zero = Nat 0;
 
   fun isZero n = (n = zero);

   fun succ (Nat n) = Nat (n+1);
 
   fun pred (Nat 0) = raise Pred
     | pred (Nat n) = Nat (n - 1);
 
   fun addNat (Nat i) (Nat j) = Nat (i+j);
 
   fun multNat (Nat i) (Nat j) = Nat (i * j);

   fun natToInt (Nat n) = n;
end;

(* An abstract implementation of natural numbers using unit lists *)
 
abstype nat = Nat of unit list
with
   exception Pred;
 
   val zero = Nat [];
 
   fun isZero n = (n = zero);

   fun succ (Nat n) = Nat (()::n);
 
   fun pred (Nat []) = raise Pred
     | pred (Nat (()::prd)) = Nat (prd);
 
   fun addNat (Nat []) (Nat j) = Nat j
     | addNat i j = succ (addNat (pred i) j);
 
   fun multNat (Nat []) (Nat j) = zero
     | multNat i j = addNat j (multNat (pred i) j);

   fun natToInt (Nat n) = length n;
end;

Note that abstypes are never equality types. Why?

	This page copyright ©2000 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Monday, January 31, 2000.
http://www.cs.hmc.edu/~hodas/courses/cs131/lectures/lecture06s.html

Harvey Mudd College Computer Science 131 Programming Languages Spring Semester 2000

Lecture 06

This page copyright ©2000 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Monday, January 31, 2000.

http://www.cs.hmc.edu/~hodas/courses/cs131/lectures/lecture06s.html

Harvey Mudd College
Computer Science 131
Programming Languages
Spring Semester 2000