Link to Home

If this page looks abnormally plain, you should consider upgrading to a standards-compliant browser.

Links to other sections of this site appear at the bottom of the page.

Using SML/NJ

This is a guide to editing and executing Standard ML (SML) programs at Harvey Mudd College, using the Standard ML of New Jersey system. This document was adapted by Chris Stone from the guide for Using SML/NJ at Carnegie Mellon University, written by Peter Lee, with extensive contributions by others.

This is not a reference manual for the Standard ML language. If you need a reference manual or a tutorial, you can find several sources of information, both on-line and in hard copy from the CS131 web page.

Interacting with SML/NJ

When you start the SML/NJ system, it loads and responds with a message giving the current version number and then a prompt for user input. The prompt is a single dash ("-").

When prompted, you can type in a top-level declaration. There are several kinds of top-level declarations in SML. For example, the following is declaration of a function called inc that increments its integer argument. (In these examples, the dash ("-") is the SML/NJ prompt, and the text in teletype font is the user input. In some browsers, user input will also appear in blue text. The italic font is used for the output from the SML/NJ system. The symbol <enter> represents a carriage return on UNIX-based systems or the Enter key on the PC and Macintosh systems.)

- fun inc x = x + 1;<enter>
    val inc = fn : int -> int

The text "fun inc x = x + 1" is the declaration for the inc function. The semicolon (";") is a marker that indicates to the SML/NJ system that it should perform the following actions: elaborate (that is, perform typechecking and other static analyses), compile (to obtain executable machine code), execute, and finally print the result of this declaration. After all of this, the system then prompts for new input and the whole process starts again. This is the so-called "top-level loop". To exit from the SML/NJ system, simply type an end-of-file character (control-d for UNIX) to the prompt.

In the example above, the printed result shows that inc is a function that takes an integer argument and yields an integer result. Actually, it is important for you to know that, in SML, functions are "first-class" values, fundamentally no different than other values such as integers. So, to be more precise, it is better to say that the identifier inc has been bound to a value (which happens to be a function, as denoted by the fn keyword above) of type int -> int.

If we had left out the semicolon, then the elaboration, compilation, execution, and printing would have been deferred and a prompt (this time, an equal sign, "=") would be given, for either a continuation of the declaration of inc or else another top-level declaration. When a semicolon is finally entered (perhaps after several more top-level declarations), all of the declarations since the last semicolon would be processed in sequence. For example:

- fun inc x = x + 1<enter>
= fun f n = (inc n) * 5;<enter>
    val inc = fn : int -> int
    val f = fn : int -> int

In this example, we have defined the inc function as well as a function f that uses inc.

In the interactive top-level loop, the simplest form of input is an expression. For example, after typing in the declarations for inc and f above, we can now call f by typing in:

f (2+4);<enter>
    val it = 35 : int

Notice that since no identifier is given to bind to the value, the interactive system has chosen the identifier it and bound it to the result of compiling and executing the expression f (2+4).

You might have experience with other languages whose implementations support a similar kind of interactive top-level loop. For example, most implementations of the Lisp, Scheme, and Basic languages support top-level loops. If you have experience with any of these languages, then you might expect that redefining a function will change the binding of the function name, as well as any other functions that call that function. However, in the SML/NJ system, this is not the case. For example, suppose we wish to change the definition of the inc function, so that it increments by two instead of one:

- fun inc x = x + 2;<enter>
    val inc = fn : int -> int

In typical Lisp and Scheme systems, such a redefinition would cause the function f to change as well, since f calls inc. But in the SML/NJ system, f's binding does not change, so in fact referring to f now still yields the original function:

- f (2+4);<enter>
    val it = 35 : int

To understand why the SML/NJ system behaves in this way, consider what would happen if we redefinedinc so that it had a type different than int -> int, for example:

- fun inc x = (x mod 2 = 0);<enter>
    val inc = fn : int -> bool

Here, inc has been changed to a function that returns true if and only if its integer argument is even. Now, if f should also be changed to reflect this redefinition (as it would be in Lisp and Scheme systems), it would fail to typecheck. This is not necessarily a bad thing, but at any rate the SML/NJ system does not bother to go back to earlier top-level declarations and re-elaborate them; hence, f's binding is left unchanged.

If you are already familiar with the SML language, then you can think of the sequence of top-level declarations typed into an SML/NJ interactive top-level loop as being in nested let-bindings:

let fun inc x = x + 1 in
  let fun f n = (inc n) * 5 in
    let fun inc x = x + 2 in
      ...

Using Files and the Standard Basis

Instead of typing your program into the interactive top-level, it is more productive to put your program into a file (or set of files) and then load it (them) into the SML/NJ system. The simplest way to do this is to use the built-in function use. For example:

- use "myprog.sml";<enter>
    [opening myprog.sml]
    ...
    val it = () : unit

The use function takes the name of the file (of type string) to load. If the file exists, it is opened and read, with each top-level declaration in the file processed in turn (and the results printed on the standard output). The "result" of the use function is the unit value ("()").

As your programs get larger and the code becomes spread over many modules, you can find it extremely difficult to remember exactly the right order in which to "use" the files. In order to alleviate this problem, the SML/NJ system has a built-in feature called the Compilation Manager, or simply CM, which is highly recommended for programs involving multiple files. CM is a complex system with documentation available online. For most uses the simplest interface is sufficient: simply create a file in the current directory called sources.cm which contains the names of all of your SML source files, listed one per line in any order. Once this file is created, then you can use the function CM.make to load, compile, and execute your system. For example, suppose you have three source files, a.sig, b.sml, and c.sml. Then you can create a file called sources.cm with the following contents:

Group is

a.sig
b.sml
c.sml

Note that it does not matter in what order the file names occur. Once this file has been created, typing the following to the SML/NJ system will do whatever is necessary in order to load your program:

- CM.make();<enter>

The CM.make function will scan all of your sources files and calculate the dependencies among them so as to compile and load them in the right order. If CM.make has already been used before to compile and load your program, then it looks to see what files have been changed since the last "make", and then loads and compiles the minimal number of files necessary in order to bring the system up-to-date. After running CM.make, you might notice a new directory in your source file directory. This new directory is used by CM to "remember" the results of the dependency calculation, as well as to store the results of compiling your files so that they don't have to be compiled again (unless, of course, they have been changed).

There is an extensive set of predefined values and functions in the SML/NJ system. This is referred to as the standard basis, or sometimes the pervasive environment. As with CM, extensive documentation for the standard basis is available online. (A book on the standard basis will be published someday.) For dealing with files, the following function is often useful:

OS.FileSys.chDir : string -> unit

This function implements the standard "cd" UNIX command, which changes the current working directory to the directory specified in the string argument. This is useful if you have started the SML/NJ system in a directory different from the one containing your source files.

Another set of basis functions are useful for controlling the output produced by the SML/NJ system:

Compiler.Control.Print.printDepth : int ref
Compiler.Control.Print.printLength : int ref

These variables control the maximum depth and length to which lists, tuples, and other data structures are to be printed. When a data structure is deeper than printDepth or longer than printLength, the remaining portion of the structure is printed as an ellipse ("...").

To change the value of one of these variables, an assignment can be used. For example:

- Compiler.Control.Print.printDepth := 10;<enter>

changes the maximum print depth to ten.

The standard basis contains many modules and functions for manipulating values of all of the basic types, including booleans, integers, reals, characters, strings, arrays, and lists. Unfortunately, the SML/NJ system does not provide any kind of browser, so either you need to refer to the written documentation for the standard basis, or use a little bit of a hack in order to see the complete set of basis functions currently supplied in the SML/NJ for these types. For example, type the following to the interactive top-level:

- signature S = INTEGER;<enter>

Each set of standard basis functions is encapsulated in an SML module, and each such module has a signature, or "interface", whose name is written entirely in uppercase and refers to the type of values for which the module provides functionality. (Note that SML is case sensitive.) For the integer functions, the signature is called INTEGER. So, the above declaration simply binds the identifier S to the signature INTEGER, which causes the SML/NJ system to respond with a listing of the entire INTEGER interface. (We could have used any name besides S.) Other useful signatures include BOOL, REAL, CHAR, STRING, ARRAY, and LIST. For functions that interface to the operating system (such as OS.FileSys.chDir above), see the signature OS (and POSIX, if provided). There are many many other useful modules in the standard basis as well.

Editing Files Using Emacs

I recommend using Emacs to edit your SML programs and also to manage interaction with the SML/NJ system. To do this, you should incorporate the "sml mode" into your emacs startup file. For example, on turing you can simply add the line

(load "/usr/local/sml/sml-mode/sml-mode-setup")

to the .emacs file in your home directory; then the next time you start Emacs, sml-mode will be present. (If you have other SML-related commands already in your .emacs file you may want to comment them out, at least at first.)

With sml-mode, a special editing mode will be invoked any time you edit a file with an appropriate extension (such as ".sml"). As in other special editing modes, using the Tab key or Control-j will cause emacs to attempt to indent your code in a pleasing way; Control-c followed by Tab will indent the current region. Since SML's syntax is rather complex, the sml-mode indentation can be rather haphazard at times. Still, many people find it to be quite useful.

To run SML/NJ inside Emacs, type M-x run-sml (that is, the Meta key and x, followed by the word run-SML). This will start up the SML/NJ system inside an Emacs buffer. There are several useful emacs commands for interacting with the inferior SML shell. You can find documentation for them by hitting Control-h m. Some of the most basic commands are

C-c C-b send the contents of the current buffer to SML ("use" the file; note that the cursor has to be in the buffer containing the ML code and not the buffer where SML is running)
C-c C-r send the code in the current emacs selection (region) to the SML shell
C-c C-s put the cursor in the SML shell

When the cursor is in the buffer containing the running SML process, the following keys are very useful:

M-p Scroll back to the previous command entered at the SML prompt (can be hit repeatedly)
M-n Scroll forward to the next command entered at the SML prompt (opposite of M-p)

Making Sense of Error Messages

As with most compilers, the SML/NJ system often produces error messages that can be hard to decipher. The problem is compounded by the fact that SML supports polymorphic type inference, which makes it very difficult for the compiler to figure out precisely the real source of a type error. On the other hand, once all of the compile-time type errors are removed, it is often the case that the bulk of the bugs have already been stamped out. In practice, SML programs often work the first time, once all of the type errors reported by the compiler have been removed!

Type mismatches

The most common kind of error is the simple type mismatch. For example, suppose we have the following code in a file called myprog.sml:

fun inc x = x + 1
fun f n = inc true

Notice that a semicolon is not needed here, since the end-of-file marker will serve the same purpose. Now, if we load this file, we get the following error message:

use "myprog.sml";<enter>
    myprog.sml:2.11-2.19 Error: operator and operand don't agree [tycon mismatch]
    operator domain: int
    operand:         bool
    in expression:
       inc true

The error message indicates that the expression inc true, beginning on line 2, column 11 and ending at line 2, column 19, contains a type error. The function inc is being applied to the argument true of type bool, but the domain (argument type) of the inc function is int.

If you are using SML-mode in Emacs, then typing C-c C-l in an edit buffer containing the program would cause the SML/NJ system to load the file, and then typing Cc ` would move the edit cursor to the exact point in the program corresponding to this error message.

The value restriction

One of the most fundamental changes in the 1997 revision of the SML language is that it now enforces something called the value restriction. Essentially, this restricts polymorphism to expressions that clearly are values, specifically single identifiers and functions. When this restriction is violated, the error message, "nongeneric type variable," is given. For example, the following program results in this error:

fun id x = x

fun map f nil = nil
  | map f (h::t) = (f h) :: (map f t)

val f = map id

The message given is

myprog.sml:6.1-6.15 Warning: type vars not generalized because of
   value restriction are instantiated to dummy types (X1,X2,...)
val f : ?.X1 list -> ?.X1 list

which indicates that the expression map id would be polymorphic, except that it is not syntactically a value and so it can't be given the polymorphic type 'a list -> 'a list. It could be correctly be given any of the types int list -> int list, or (string*string) list -> (string*string) list, or any other of an infinite number of non-polymorphic types, but there's no way to know which one the user meant; therefore the SML/NJ compiler plugs in a "dummy" type (which it prints here as ?.X1) that you definitely didn't mean to force you to go back and specify the type you did.

The reasons for the value restriction are beyond the scope of this document, but are explained in several papers as well as most textbooks on Standard ML.

Syntax errors

Because the syntax of SML is rather complex, there are several common errors that novices tend to make. One of the most common has to do with the syntax of patterns in clausal-form function declarations and case expressions. Consider the following code:

datatype 'a btree = Leaf of 'a
                  | Node of 'a btree * 'a btree
fun preorder Leaf(v) = [v]
  | preorder Node(l,r) = preorder l @ preorder r

The SML/NJ system complains vigorously over this:

myprog.sml:4.5-5.49 Error: data constructor Leaf used without argument in pattern
myprog.sml:4.5-5.49 Error: data constructor Node used without argument in pattern
myprog.sml:4.1-5.49 Error: pattern and expression in val rec dec don't agree (tycon mismatch)
  expression:  'Z * 'Z -> ('Z * 'Z) list
  result type:  ('Z * 'Z) list
  in declaration:
    preorder = (fn arg => (fn <pat> => <exp>))

The problem here is that Leaf and Node are patterns that are syntactically separate from, respectively, the (v) and (l,r) patterns. The (admittedly strange) syntax of SML requires extra parenthesization around arguments defined with fun:

fun preorder (Leaf v) = [v]
  | preorder (Node(l,r)) = preorder l @ preorder r

Another rather confusing part of the syntax has to do with the interaction between case expressions, exception handlers, and clausal-form function declarations. Consider the following function, taken in slightly modified form from the SML/NJ library (described more below):

datatype 'a option = NONE | SOME of 'a
fun filter pred l =
      let fun filterP (x::r, l) =
                case (pred x) of
                   SOME y => filterP(r, y::l)
                 | NONE => filterP(r, l)
            | filterP ([], l) = rev l             (* XXX *)
      in
        filterP (l, [])
      end

In this example, the local function filterP is defined in two clauses, the first handling the case of a non-empty list argument, and the second handling the empty list. In the first clause, a case expression is used. The syntactic ambiguity arises from the fact that it takes too much “lookahead” to figure out whether or not the second clause of filterP, marked above with XXX, is actually a continuation of the case expression. This leads to the following rather cryptic error message:

myprog.sml:8.24 Error: syntax error: replacing  EQUALOP with  DARROW

As before, parenthesization fixes the problem:

fun filter pred l =
      let fun filterP (x::r, l) =
                (case (pred x) of
                    SOME y => filterP(r, y::l)
                  | NONE => filterP(r, l))
            | filterP ([], l) = rev l
      in
        filterP (l, [])
      end

Alternatively, in this example we can also exchange the two clauses of filterP:

fun filter pred l =
      let fun filterP ([], l) = rev l
            | filterP (x::r, l) =
                case (pred x) of
                   SOME y => filterP(r, y::l)
                 | NONE => filterP(r, l)
      in
        filterP (l, [])
      end

As with many programming languages, the basic advice to follow is: When in doubt, parenthesize.

Exporting Heaps

The SML language encourages modularity, and in practice separate modules tend to be placed into separate files. While this is useful during development, it becomes highly inconvenient when you finally "ship" your finished program to your users. The standard way to ship a program, then, is to save an image of the system heap after all of your files have been loaded. This is referred to as "exporting" the heap, and results in a single file that contains the state of your SML world at the time you performed the export operation.

You can export a heap with the function exportML. For example, to save the heap image in a file called mysml, the following should be typed to the SML/NJ prompt:

- SMLofNJ.exportML "mysml";<enter>

This will save the current state of the SML/NJ system into the file mysml. This can then be executed later by running the SML system with the command-line option, "@SMLload=mysml". This will restart the SML/NJ system at the same point in which the exportML took place. (Note that exportML is not supported for the Macintosh System 7 version.)

There is also a function called exportFn, which saves an SML state as a function that takes in the shell command-line arguments when restarted. The functionality of exportFn is

SMLofNJ.exportFn : string * (string * string list -> OS_Process.status) -> unit

The first argument is the name of the file to contain the exported heap image. The second argument is a function that takes the command line and command line arguments (as strings) and returns a process-status value (usually OS_Process.success or OS_Process.failure).

Tools

In addition to the standard basis, the SML/NJ system comes with several tools and libraries. The ml-lex and ml-yacc programs perform automatic generation of lexical analyzers and LALR(1) parsers, respectively. Documentation for these and other useful tools can be found at the SML/NJ documentation page.

In addition to the Standard Basis Library, a library of functions automatically available with nearly every SML implementation, there is an SML/NJ Library has its own specific library of data structures and functions whose code on turing can be found in the directory /usr/local/sml/110/src/smlnj-lib/Util.

Finally, extensions to SML for concurrency and interaction with the X window system are supported by the Concurrent ML and eXene extensions to SML.




Return to Top of Page