------------------------

Harvey Mudd College
Computer Science 131
Programming Languages
Spring Semester 2000

Lecture 07

------------------------

------------------------

Suppose you want to create a list data type that comes with an insertion function that puts things into the list in sorted order. For custom integer lists (or real lists, or string lists) this is easy. In fact, the data structure itself is essentially the same as the intlist data type we defined in the last lecture (although here I am redefining it to more closely match the 'a lst example).

   infix :::;

   datatype ordered_intlst = nl | ::: of int * ordered_intlst;

   fun insert i nl = i:::nl
     | insert i (h:::t) = if (i < h) orelse (i = h)
                            then i:::(h:::t)
                            else h:::(insert i t);

   fun member _ nl = false
     | member i (h:::t) = if (i < h) 
                            then false
                            else if (i = h)
                                   then true
                                   else member i t;

Here we also defined a new version of member that takes advantage of the ordering to return false earlier than ordinary member would be able to.

------------------------

Now, the difficulty in extending this to polymorphic lists is that we used the < operator which is available (overloaded) at some types, but not arbitrary types.

The problem is that we need a different ordering function for each type. The solution is to carry the ordering function around with the list.

The data structure is in two parts, the recursive part that is really the ordinary 'a list definition and an outer part that attaches the ordering function.

   infix :::;

datatype 'a lst = nl | ::: of 'a * 'a lst type 'a ordering = 'a * 'a -> bool datatype 'a ordered_lst = olst of 'a lst * 'a ordering;

(Obviously, I could have built this just out of the ordinary 'a list type, and dispensed with the first definition, but I wanted to show how to do it from scratch.)

In general, we will need to either build the empty lists manually, or provide a function that builds them for us, such as:

   fun new_lst lt = olst (nl,lt);

------------------------

Now, we can define insert2 and member2 as follows:

   fun insAux _ e nl = e:::nl
     | insAux lt e (h:::t) = if (lt (e,h)) orelse (e = h)
                                 then e:::(h:::t)
                                 else h:::(insAux lt e t);

   fun insert2 e (olst (lst,lt)) = olst (insAux lt e lst,lt);
 
   fun membAux _ _ nl = false
     | membAux lt e (h:::t) = if (lt (e,h)) 
                                  then false
                                  else if (e = h)
                                         then true
                                         else membAux lt e t;

   fun member2 e (olst (lst,lt)) = membAux lt e lst;

------------------------

Given that the function lt is just passed along unchanged from one call of the aux functions to the next, though, it would be a little cleaner to use the let trick to get rid of the unnecessary parameter:

   fun insert2 e (olst (lst,lt)) = 
     let
        fun insAux e nl = e:::nl
          | insAux e (h:::t) = if (lt (e,h)) orelse (e = h)
                                 then e:::(h:::t)
                                 else h:::(insAux e t);
     in
        olst (insAux e lst,lt)
     end;

   fun member2 e (olst (lst,lt)) = 
     let
        fun membAux _ nl = false
          | membAux e (h:::t) = if (lt (e,h)) 
                                  then false
                                  else if (e = h)
                                         then true
                                         else membAux e t;
     in
        membAux e lst
     end;
This emphasizes that the insertion and member functions for these ordered lists are really the same as the ordinary functions if we have a definition for the ordering.

------------------------

The ML module system provides a clean mechanism for gathering related pieces of code together and for controlling which parts of the code are accessible from the outside. At the same time, all the internal code can be asily tested at the top-level with little or no extra work required for final packaging.

The SML module system is built on three pieces: structures, signatures, and functors. We have already described structures, the simplest part of the picture, in an earlier lecture. We will review the notion of structure briefly, then discuss signatures and functors, which are the real strentgh of the module system.

------------------------

A structure is simply a collection of type and value definitions gathered together so that they may be loaded together, and accessed through one name.

For example, we can define the following two structures from the natural numbers types built in the last lecture:

(* An structure for natural numbers using integers *)
 
structure intNat = 
struct
   datatype nat = Nat of int;
 
   exception Pred;
 
   val zero = Nat 0;
   
   fun isZero (Nat 0) = true
     | isZero _       = false;
 
   fun succ (Nat n) = Nat (n+1);
 
   fun pred (Nat 0) = raise Pred
     | pred (Nat n) = Nat (n - 1);
 
   fun addNat (Nat i) (Nat j) = Nat (i+j);
 
   fun multNat (Nat i) (Nat j) = Nat (i * j);

   fun toInt (Nat n) = n;
end;

(* An structure for natural numbers using unit lists *)
 
structure unitNat = 
struct
   datatype nat = Nat of unit list
 
   exception Pred;
 
   val zero = Nat [];
 
   fun isZero (Nat []) = true
     | isZero _        = false;
 
   fun succ (Nat n) = Nat (()::n);
 
   fun pred (Nat []) = raise Pred
     | pred (Nat (()::prd)) = Nat (prd);
 
   fun addNat (Nat []) (Nat j) = Nat j
     | addNat i j = succ (addNat (pred i) j);
 
   fun multNat (Nat []) (Nat j) = zero
     | multNat i j = addNat j (multNat (pred i) j);

   fun toInt (Nat n) = length n;
end;

In order to refer to an element of a structure, you must give its fully-qualified name, for example, unitNat.zero. This can become a little cumbersome during debugging, so you can open the structure so that you can get to the names directly.

While there is no way to explicitely close a structure once opened, it is possible to open a structure over only a single expression, as in:

let 
   open unitNat
in
   succ (succ (succ zero))
end;

All of the "built-in" functions in ML are actually in structures that are loaded and (mostly) opened in the start-up environment. These are called the pervasives. As we have seen earlier, one trick for disambiguating an overloaded function, instead of giving its type, is to use its fully-qualified name, as in:

fun addReal x y = Real.+ (x,y);
Notice, though, that the qualified name is not an infix operator. We could also define this as:
fun addReal x y = let 
                      open Real
                   in
                      x + y
                   end;  

Finally, note that while structures are not first-class (they can't be passed to functions, for example), they can be assigned from one to another:

structure nat1 = unitNat;

------------------------

Most modern languages intended for programming in the large provide for some notion of distinguishing between the implementation of a module and its interface (or specification). In SML, structures specify implementations, while signatures give specifications.

For example, if we wish to specify what an implementation of natural numbers must provide, we could say:

(* An specification for implementations of natural numbers *)
 
signature NAT = 
sig
   type nat
 
   exception Pred
 
   val zero : nat
 
   val isZero : nat -> bool;
   val toInt : nat -> int;
   
   val succ : nat -> nat
   val pred : nat -> nat
 
   val addNat : nat -> nat -> nat
   val multNat : nat -> nat -> nat
end;

We can then use this signature as a way of "specification checking" an implementation. If we specify that a structure is of signature NAT, then when it is compiled the system will check that it has satisfied all of the requirements of the specification-- that is, that it has defined all the specified types, values, etc:

(* An constrained structure for natural numbers using unit lists *)
 
structure unitNat : NAT= 
struct
   datatype nat = Nat of unit list
 
   exception Pred;
 
   val zero = Nat [];
 
   fun succ (Nat n) = Nat (()::n);
 
   fun pred (Nat []) = raise Pred
     | pred (Nat (()::prd)) = Nat (prd);
 
   fun addNat (Nat []) (Nat j) = Nat j
     | addNat i j = succ (addNat (pred i) j);
 
   fun multNat (Nat []) (Nat j) = zero
     | multNat i j = addNat j (multNat (pred i) j);

   fun toInt (Nat n) = n;
end;

A signature not only specifies what a structure must implement, but also what names it exports. If a function occurs in a structure that is not in the signature, it is not exported. For example, if we specify a simple version of nats:

(* An specification for stripped-down natural numbers *)
 
signature SIMPLENat = 
sig
   type nat
 
   exception Pred
 
   val zero : nat
 
   val isZero : nat -> bool;

   val succ : nat -> nat
   val pred : nat -> nat
end;

then we can build a constrained version of the unit-list natural numbers as in:

structure nat2 : SIMPLENat = unitNat;
In this way, the use of signatures effectively eliminates the need for local definitions. This makes development and testing much easier.

Notice that while the structure only exports the names mentioned in the structure, the system will still display the constructors used in building values of types defined in the signature. Although those constructors cannot be used unless they are also named in the signature, the types exported by structures are not truly abstract since the programmer can see their structure. To accomplish full data abstraction with the module system, the standard method as of SML '96 is to replace the colon used to give a signature constraint with :>. For example:

structure nat3 :> SIMPLENat = unitNat;

------------------------

This page copyright ©2000 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Monday, January 31, 2000.
http://www.cs.hmc.edu/~hodas/courses/cs131/lectures/lecture07s.html