


Suppose you want to create a list data type that comes with an
insertion function that puts things into the list in sorted order.
For custom integer lists (or real lists, or string lists) this is
easy. In fact, the data structure itself is essentially the same as
the intlist data type we defined in the last lecture
(although here I am redefining it to more closely match the 'a
lst example).
infix :::;
datatype ordered_intlst = nl | ::: of int * ordered_intlst;
fun insert i nl = i:::nl
| insert i (h:::t) = if (i < h) orelse (i = h)
then i:::(h:::t)
else h:::(insert i t);
fun member _ nl = false
| member i (h:::t) = if (i < h)
then false
else if (i = h)
then true
else member i t;
Here we also defined a new version of member that takes
advantage of the ordering to return false earlier than
ordinary member would be able to.

Now, the difficulty in extending this to polymorphic lists is that we
used the < operator which is available (overloaded) at
some types, but not arbitrary types. There may be types (like, say,
complex numbers as defined earlier in the class) for which
< is not defined but for which we can formulate an
ordering function, and which we would like to have ordered lists
defined for.
The problem is that we need a different ordering function for each
type. The solution is to carry the ordering function around with the
list. We could attach a copy of the ordering function to each item of
the list, or to each of the cons operators, but that would be
silly. Instead we'll just carry a single copy with each list. So, the
data structure is in two parts, the recursive part that is really the
ordinary 'a list definition and an outer part that
attaches the ordering function.
infix :::;datatype 'a lst = nl | ::: of 'a * 'a lst type 'a ordering = 'a * 'a -> bool datatype 'a ordered_lst = olst of 'a lst * 'a ordering;
(Obviously, I could have built this just out of the ordinary 'a
list type, and dispensed with the first definition, but I
wanted to show how to do it from scratch.)
Now, one important difference between 'a ordered_lsts and
ordinary 'a lsts is that we can't have a generic empty
one, since we have to attach an ordering function for the kind of
elements that will eventually be entered into the list. So, for
example, the value nl is an 'a lst, but as
soon as we type olst (nl, ...), we must fill in the
... with some type specific ordering function. So we will
not get an 'a ordered_lst but, for example, an int
ordered_lst.
In general, we will need to either build those empty lists manually, or provide a function that builds them for us, such as:
fun new_lst lt = olst (nl,lt);

Now, we can define insert2 and member2 as
follows:
local
fun ins_aux _ e nl = e:::nl
| ins_aux lt e (h:::t) = if (lt (e,h)) orelse (e = h)
then e:::(h:::t)
else h:::(ins_aux lt e t);
in
fun insert2 e (olst (lst,lt)) = olst (ins_aux lt e lst,lt)
end;
local
fun memb_aux _ _ nl = false
| memb_aux lt e (h:::t) = if (lt (e,h))
then false
else if (e = h)
then true
else memb_aux lt e t;
in
fun member2 e (olst (lst,lt)) = memb_aux lt e lst
end;

Given that the function lt is just passed along unchanged
from one call of the aux functions to the next, though, it would be a
little cleaner to use the let trick to get rid of the
unnecessary parameter:
fun insert2 e (olst (lst,lt)) =
let
fun ins_aux e nl = e:::nl
| ins_aux e (h:::t) = if (lt (e,h)) orelse (e = h)
then e:::(h:::t)
else h:::(ins_aux e t);
in
olst (ins_aux e lst,lt)
end;
fun member2 e (olst (lst,lt)) =
let
fun memb_aux _ nl = false
| memb_aux e (h:::t) = if (lt (e,h))
then false
else if (e = h)
then true
else memb_aux e t;
in
memb_aux e lst
end;
This emphasizes that the insertion and member functions for these
ordered lists are really the same as the ordinary functions if we have
a definition for the ordering.
We talked last class about using the SML abstype
definition to create abstract types. But it is an unsatisfying way of
working. First, it is only useful for just that, individual abstract
types. It is not useful for creating packages of types and functions
that somehow go together. Secondly, it is a bit of a development
headache. While you are developing your code it is a lot easier to
work with a datatype than an abstype, since
you can easily type constants in the type and pattern match against
them for testing. So, you are often stuck with developing and testing
the code as a datatype, then rebundling it as an
abstype. The same is true of support code. It is easiest
to debug code without a lot of local definitions, so that
you can call the support functions directly in testing. You are stuck
with adding a lot of local wrappers after the fact.
The ML module system provides a clean mechanism for gathering related pieces of code together and for controlling which parts of the code are accessible from the outside. At the same time, all the internal code can be asily tested at the top-level with little or no extra work required for final packaging.
The SML module system is built on three pieces, which we shall describe in turn: structures, signatures, and functors.

(* An structure for natural numbers using integers *)
structure int_nat =
struct
datatype nat = Nat of int;
exception Pred;
val zero = Nat 0;
fun is_zero (Nat 0) = true
| is_zero _ = false;
fun succ (Nat n) = Nat (n+1);
fun pred (Nat 0) = raise Pred
| pred (Nat n) = Nat (n - 1);
fun add_nat (Nat i) (Nat j) = Nat (i+j);
fun mult_nat (Nat i) (Nat j) = Nat (i * j);
fun nat_to_int (Nat n) = n;
end;
|
(* An structure for natural numbers using unit lists *)
structure unit_nat =
struct
datatype nat = Nat of unit list
exception Pred;
val zero = Nat [];
fun is_zero (Nat []) = true
| is_zero _ = false;
fun succ (Nat n) = Nat (()::n);
fun pred (Nat []) = raise Pred
| pred (Nat (()::prd)) = Nat (prd);
fun add_nat (Nat []) (Nat j) = Nat j
| add_nat i j = succ (add_nat (pred i) j);
fun mult_nat (Nat []) (Nat j) = zero
| mult_nat i j = add_nat j (mult_nat (pred i) j);
fun nat_to_int (Nat n) = length n;
end;
|
In order to refer to an element of a structure, you must give its
fully-qualified name, for example, unit_nat.zero. This
can become a little cumbersome during debugging, so you can
open the structure so that you can get to the names
directly.
Notice that when a structure is opened, ML will report types from that structure using the short form, but if the name is overwritten (such as by opening another structure including a similarly named type), ML will return to using the fully-qualified form.
While there is no way to explicitely close a structure once opened, it is possible to open a structure over only a single expression, as in:
let open unit_nat in succ (succ (succ zero)) end;
All of the "built-in" functions in ML are actually in structures that are loaded and (mostly) opened in the start-up environment. These are called the pervasives. As we have seen earlier, one trick for disambiguating an overloaded function, instead of giving its type, is to use its fully-qualified name, as in:
fun add_real x y = Real.+ (x,y);Notice, though, that the qualified name is not an infix operator. We could also define this as:
fun add_real x y = let
open Real
in
x + y
end;
Finally, note that while structures are not first-class (they can't be passed to functions, for example), they can be assigned from one to another. So, for instance, you can say:
structure nat1 = unit_nat;
Most modern languages intended for programming in the large provide for some notion of distinguishing between the implementation of a module and its interface (or specification). In SML, structures specify implementations, while signatures give specifications. For example, if we wish to specify what an implementation of natural numbers must provide, we could say:
(* An specification for implementations of natural numbers *) signature NAT = sig type nat exception Pred val zero : nat val is_zero : nat -> bool; val nat_to_int : nat -> int; val succ : nat -> nat val pred : nat -> nat val add_nat : nat -> nat -> nat val mult_nat : nat -> nat -> nat end; |
We can then use this signature as a way of "specification checking" an
implementation. If we specify that a structure is of signature
NAT, then when it is compiled the system will check that
it has satisfied all of the requirements of the specification-- that
is, that it has defined all the specified types, values, etc:
(* An constrained structure for natural numbers using unit lists *)
structure unit_nat : NAT=
struct
datatype nat = Nat of unit list
exception Pred;
val zero = Nat [];
fun succ (Nat n) = Nat (()::n);
fun pred (Nat []) = raise Pred
| pred (Nat (()::prd)) = Nat (prd);
fun add_nat (Nat []) (Nat j) = Nat j
| add_nat i j = succ (add_nat (pred i) j);
fun mult_nat (Nat []) (Nat j) = zero
| mult_nat i j = add_nat j (mult_nat (pred i) j);
fun nat_to_int (Nat n) = n;
end;
|
If we have particular requirements of the way in which an exported
type is implemented, we can specify that it must, for instance, be a
datatype with a particular set of constructors, or that it be an
abstract type, or an equality type (using the keyword
eqtype in place of type).
A signature not only specifies what a structure must implement, but also what names it exports. If a function occurs in a structure that is not in the signature, it is not exported. For example, if we specify a simple version of nats:
(* An specification for stripped-down natural numbers *) signature SIMPLE_NAT = sig type nat exception Pred val zero : nat val is_zero : nat -> bool; val succ : nat -> nat val pred : nat -> nat end; |
then we can build a constrained version of the unit-list natural numbers as in:
structure nat2 : SIMPLE_NAT = unit_nat;In this way, the use of signatures effectively eliminates the need for
local definitions. This makes development and testing
much easier.
Notice that while the structure only exports the names mentioned in
the structure, the system will still display the constructors used in
building values of types defined in the signature. Although those
constructors cannot be used unless they are also named in the
signature, the types exported by structures are not truly abstract
since the programmer can see their structure. To accomplish full data
abstraction with the module system, the standard method as of SML '96
is to replace the colon used to give a signature constraint with
:>.
For example:
structure nat3 :> SIMPLE_NAT = unit_nat;

|
|
This page copyright ©1999 by Joshua S. Hodas. It was built on a Macintosh. Last rebuilt on Monday, February 15, 1999 at 2:15:10 PM. |
http://cs.hmc.edu/~hodas/courses/cs131/lectures/lecture08.html | |