Lecture 08

Harvey Mudd College
Computer Science 131
Programming Languages
Spring Semester 1999

Lecture 08 (2/15/99)

Continuing Introduction to ML
- A More Complex Example of User Defined Types
- The ML Module System
  - Structures
  - Signatures

Suppose you want to create a list data type that comes with an insertion function that puts things into the list in sorted order. For custom integer lists (or real lists, or string lists) this is easy. In fact, the data structure itself is essentially the same as the intlist data type we defined in the last lecture (although here I am redefining it to more closely match the 'a lst example).

infix :::;

datatype ordered_intlst = nl | ::: of int * ordered_intlst;

fun insert i nl = i:::nl
  | insert i (h:::t) = if (i < h) orelse (i = h)
                         then i:::(h:::t)
                         else h:::(insert i t);

fun member _ nl = false
  | member i (h:::t) = if (i < h) 
                         then false
                         else if (i = h)
                                then true
                                else member i t;

Here we also defined a new version of member that takes advantage of the ordering to return false earlier than ordinary member would be able to.

Now, the difficulty in extending this to polymorphic lists is that we used the < operator which is available (overloaded) at some types, but not arbitrary types. There may be types (like, say, complex numbers as defined earlier in the class) for which < is not defined but for which we can formulate an ordering function, and which we would like to have ordered lists defined for.

The problem is that we need a different ordering function for each type. The solution is to carry the ordering function around with the list. We could attach a copy of the ordering function to each item of the list, or to each of the cons operators, but that would be silly. Instead we'll just carry a single copy with each list. So, the data structure is in two parts, the recursive part that is really the ordinary 'a list definition and an outer part that attaches the ordering function.

infix :::;

datatype 'a lst         = nl | ::: of 'a * 'a lst
type     'a ordering    = 'a * 'a -> bool
datatype 'a ordered_lst = olst of 'a lst * 'a ordering;

(Obviously, I could have built this just out of the ordinary 'a list type, and dispensed with the first definition, but I wanted to show how to do it from scratch.)

Now, one important difference between 'a ordered_lsts and ordinary 'a lsts is that we can't have a generic empty one, since we have to attach an ordering function for the kind of elements that will eventually be entered into the list. So, for example, the value nl is an 'a lst, but as soon as we type olst (nl, ...), we must fill in the ... with some type specific ordering function. So we will not get an 'a ordered_lst but, for example, an int ordered_lst.

In general, we will need to either build those empty lists manually, or provide a function that builds them for us, such as:

fun new_lst lt = olst (nl,lt);

Now, we can define insert2 and member2 as follows:

local
   fun ins_aux _ e nl = e:::nl
     | ins_aux lt e (h:::t) = if (lt (e,h)) orelse (e = h)
                                 then e:::(h:::t)
                                 else h:::(ins_aux lt e t);
in
   fun insert2 e (olst (lst,lt)) = olst (ins_aux lt e lst,lt)
end;
 
local
   fun memb_aux _ _ nl = false
     | memb_aux lt e (h:::t) = if (lt (e,h)) 
                                  then false
                                  else if (e = h)
                                         then true
                                         else memb_aux lt e t;
in
   fun member2 e (olst (lst,lt)) = memb_aux lt e lst
end;

Given that the function lt is just passed along unchanged from one call of the aux functions to the next, though, it would be a little cleaner to use the let trick to get rid of the unnecessary parameter:

fun insert2 e (olst (lst,lt)) = 
  let
     fun ins_aux e nl = e:::nl
       | ins_aux e (h:::t) = if (lt (e,h)) orelse (e = h)
                                then e:::(h:::t)
                                else h:::(ins_aux e t);
  in
     olst (ins_aux e lst,lt)
  end;

fun member2 e (olst (lst,lt)) = 
  let
     fun memb_aux _ nl = false
       | memb_aux e (h:::t) = if (lt (e,h)) 
                                 then false
                                 else if (e = h)
                                        then true
                                        else memb_aux e t;
  in
     memb_aux e lst
  end;

This emphasizes that the insertion and member functions for these ordered lists are really the same as the ordinary functions if we have a definition for the ordering.

We talked last class about using the SML abstype definition to create abstract types. But it is an unsatisfying way of working. First, it is only useful for just that, individual abstract types. It is not useful for creating packages of types and functions that somehow go together. Secondly, it is a bit of a development headache. While you are developing your code it is a lot easier to work with a datatype than an abstype, since you can easily type constants in the type and pattern match against them for testing. So, you are often stuck with developing and testing the code as a datatype, then rebundling it as an abstype. The same is true of support code. It is easiest to debug code without a lot of local definitions, so that you can call the support functions directly in testing. You are stuck with adding a lot of local wrappers after the fact.

The ML module system provides a clean mechanism for gathering related pieces of code together and for controlling which parts of the code are accessible from the outside. At the same time, all the internal code can be asily tested at the top-level with little or no extra work required for final packaging.

The SML module system is built on three pieces, which we shall describe in turn: structures, signatures, and functors.

A structure is simply a collection of type and value definitions gathered together so that they may be loaded together, and accessed through one name. In particular, by giving the contents unique fully qualified names, they prevent name clashes between different versions of the same name. For example, we can define the following two structures from the natural numbers types built in the last lecture:

(* An structure for natural numbers using integers *)
 
structure int_nat = 
struct
   datatype nat = Nat of int;
 
   exception Pred;
 
   val zero = Nat 0;
   
   fun is_zero (Nat 0) = true
     | is_zero _       = false;
 
   fun succ (Nat n) = Nat (n+1);
 
   fun pred (Nat 0) = raise Pred
     | pred (Nat n) = Nat (n - 1);
 
   fun add_nat (Nat i) (Nat j) = Nat (i+j);
 
   fun mult_nat (Nat i) (Nat j) = Nat (i * j);

   fun nat_to_int (Nat n) = n;
end;

(* An structure for natural numbers using unit lists *)
 
structure unit_nat = 
struct
   datatype nat = Nat of unit list
 
   exception Pred;
 
   val zero = Nat [];
 
   fun is_zero (Nat []) = true
     | is_zero _        = false;
 
   fun succ (Nat n) = Nat (()::n);
 
   fun pred (Nat []) = raise Pred
     | pred (Nat (()::prd)) = Nat (prd);
 
   fun add_nat (Nat []) (Nat j) = Nat j
     | add_nat i j = succ (add_nat (pred i) j);
 
   fun mult_nat (Nat []) (Nat j) = zero
     | mult_nat i j = add_nat j (mult_nat (pred i) j);

   fun nat_to_int (Nat n) = length n;
end;

In order to refer to an element of a structure, you must give its fully-qualified name, for example, unit_nat.zero. This can become a little cumbersome during debugging, so you can open the structure so that you can get to the names directly.

Notice that when a structure is opened, ML will report types from that structure using the short form, but if the name is overwritten (such as by opening another structure including a similarly named type), ML will return to using the fully-qualified form.

While there is no way to explicitely close a structure once opened, it is possible to open a structure over only a single expression, as in:

let 
   open unit_nat
in
   succ (succ (succ zero))
end;

All of the "built-in" functions in ML are actually in structures that are loaded and (mostly) opened in the start-up environment. These are called the pervasives. As we have seen earlier, one trick for disambiguating an overloaded function, instead of giving its type, is to use its fully-qualified name, as in:

fun add_real x y = Real.+ (x,y);

Notice, though, that the qualified name is not an infix operator. We could also define this as:

fun add_real x y = let 
                      open Real
                   in
                      x + y
                   end;

Finally, note that while structures are not first-class (they can't be passed to functions, for example), they can be assigned from one to another. So, for instance, you can say:

structure nat1 = unit_nat;

Most modern languages intended for programming in the large provide for some notion of distinguishing between the implementation of a module and its interface (or specification). In SML, structures specify implementations, while signatures give specifications. For example, if we wish to specify what an implementation of natural numbers must provide, we could say:

(* An specification for implementations of natural numbers *)
 
signature NAT = 
sig
   type nat
 
   exception Pred
 
   val zero : nat
 
   val is_zero : nat -> bool;
   val nat_to_int : nat -> int;
   
   val succ : nat -> nat
   val pred : nat -> nat
 
   val add_nat : nat -> nat -> nat
   val mult_nat : nat -> nat -> nat
end;

We can then use this signature as a way of "specification checking" an implementation. If we specify that a structure is of signature NAT, then when it is compiled the system will check that it has satisfied all of the requirements of the specification-- that is, that it has defined all the specified types, values, etc:

(* An constrained structure for natural numbers using unit lists *)
 
structure unit_nat : NAT= 
struct
   datatype nat = Nat of unit list
 
   exception Pred;
 
   val zero = Nat [];
 
   fun succ (Nat n) = Nat (()::n);
 
   fun pred (Nat []) = raise Pred
     | pred (Nat (()::prd)) = Nat (prd);
 
   fun add_nat (Nat []) (Nat j) = Nat j
     | add_nat i j = succ (add_nat (pred i) j);
 
   fun mult_nat (Nat []) (Nat j) = zero
     | mult_nat i j = add_nat j (mult_nat (pred i) j);

   fun nat_to_int (Nat n) = n;
end;

If we have particular requirements of the way in which an exported type is implemented, we can specify that it must, for instance, be a datatype with a particular set of constructors, or that it be an abstract type, or an equality type (using the keyword eqtype in place of type).

A signature not only specifies what a structure must implement, but also what names it exports. If a function occurs in a structure that is not in the signature, it is not exported. For example, if we specify a simple version of nats:

(* An specification for stripped-down natural numbers *)
 
signature SIMPLE_NAT = 
sig
   type nat
 
   exception Pred
 
   val zero : nat
 
   val is_zero : nat -> bool;

   val succ : nat -> nat
   val pred : nat -> nat
end;

then we can build a constrained version of the unit-list natural numbers as in:

structure nat2 : SIMPLE_NAT = unit_nat;

In this way, the use of signatures effectively eliminates the need for local definitions. This makes development and testing much easier.

Notice that while the structure only exports the names mentioned in the structure, the system will still display the constructors used in building values of types defined in the signature. Although those constructors cannot be used unless they are also named in the signature, the types exported by structures are not truly abstract since the programmer can see their structure. To accomplish full data abstraction with the module system, the standard method as of SML '96 is to replace the colon used to give a signature constraint with :>. For example:

structure nat3 :> SIMPLE_NAT = unit_nat;

	This page copyright ©1999 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Monday, February 15, 1999 at 2:15:10 PM.
http://cs.hmc.edu/~hodas/courses/cs131/lectures/lecture08.html

Harvey Mudd College Computer Science 131 Programming Languages Spring Semester 1999

Lecture 08 (2/15/99)

This page copyright ©1999 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Monday, February 15, 1999 at 2:15:10 PM.

http://cs.hmc.edu/~hodas/courses/cs131/lectures/lecture08.html

Harvey Mudd College
Computer Science 131
Programming Languages
Spring Semester 1999