If this page looks abnormally plain, you should consider upgrading to a standards-compliant browser.
Links to other sections of this site appear at the bottom of the page.
Using SML/NJ
This is a guide to editing and executing Standard ML (SML) programs at Harvey Mudd College, using the Standard ML of New Jersey system. This document was adapted by Chris Stone from the guide for Using SML/NJ at Carnegie Mellon University, written by Peter Lee, with extensive contributions by others.
This is not a reference manual for the Standard ML language. If you need a reference manual or a tutorial, you can find several sources of information, both on-line and in hard copy from the CS131 web page.
Interacting with SML/NJ
When you start the SML/NJ system, it loads and responds with a message giving the current version number and then a prompt for user input. The prompt is a single dash ("-").
When prompted, you can type in a top-level declaration. There are
several kinds of top-level declarations in SML. For example, the following is
declaration of a function called inc that increments its integer
argument. (In these examples, the dash ("-") is the SML/NJ prompt,
and the text in teletype font is the user input. In some browsers,
user input will also appear in blue text. The italic font is used for
the output from the SML/NJ system. The symbol
represents a carriage return on UNIX-based systems or the Enter key on the PC
and Macintosh systems.)
-fun inc x = x + 1;val inc = fn : int -> int
The text "fun inc x = x + 1" is the declaration for
the inc function. The semicolon (";") is
a marker that indicates to the SML/NJ system that it should perform the following
actions: elaborate (that is, perform typechecking and other static
analyses), compile (to obtain executable machine code), execute,
and finally print the result of this declaration. After all of this,
the system then prompts for new input and the whole process starts again. This
is the so-called "top-level loop". To exit from the SML/NJ system,
simply type an end-of-file character (control-d for UNIX) to the prompt.
In the example above, the printed result shows that inc is a function
that takes an integer argument and yields an integer result. Actually, it is
important for you to know that, in SML, functions are "first-class"
values, fundamentally no different than other values such as integers. So, to
be more precise, it is better to say that the identifier inc has been bound to a value (which
happens to be a function, as denoted by the fn keyword above) of
type int -> int.
If we had left out the semicolon, then the elaboration, compilation, execution,
and printing would have been deferred and a prompt (this time, an equal sign,
"=") would be given, for either a continuation of the declaration
of inc or else another top-level declaration. When a semicolon
is finally entered (perhaps after several more top-level declarations), all
of the declarations since the last semicolon would be processed in sequence.
For example:
-fun inc x = x + 1=
fun f n = (inc n) * 5;val inc = fn : int -> int val f = fn : int -> int
In this example, we have defined the inc
function as well as a function f
that uses inc.
In the interactive top-level loop, the simplest form of input
is an expression. For example, after typing in the declarations
for inc and f above, we can now call f by typing in:
f (2+4);val it = 35 : int
Notice that since no identifier is given to bind to the value,
the interactive system has chosen the identifier it and
bound it to the result of compiling and executing the expression f (2+4).
You might have experience with other languages whose implementations support
a similar kind of interactive top-level loop. For example, most implementations
of the Lisp, Scheme, and Basic languages support top-level loops. If you have
experience with any of these languages, then you might expect that redefining
a function will change the binding of the function name, as well as any other
functions that call that function. However, in the SML/NJ system, this is not
the case. For example, suppose we wish to change the definition of the inc
function, so that it increments by two instead of one:
-fun inc x = x + 2;val inc = fn : int -> int
In typical Lisp and Scheme systems, such a redefinition would cause the function
f to change as well, since f calls inc.
But in the SML/NJ system, f's binding does not change, so in fact
referring to f now still yields the original function:
-f (2+4);val it = 35 : int
To understand why the SML/NJ system behaves in this way, consider what would
happen if we redefinedinc so that it had a type different than
int -> int, for example:
-fun inc x = (x mod 2 = 0);val inc = fn : int -> bool
Here, inc has been changed to a function that returns true
if and only if its integer argument is even. Now, if f should also
be changed to reflect this redefinition (as it would be in Lisp and Scheme systems),
it would fail to typecheck. This is not necessarily a bad thing, but at any
rate the SML/NJ system does not bother to go back to earlier top-level declarations
and re-elaborate them; hence, f's binding is left unchanged.
If you are already familiar with the SML language, then you can think of the sequence of top-level declarations typed into an SML/NJ interactive top-level loop as being in nested let-bindings:
let fun inc x = x + 1 in let fun f n = (inc n) * 5 in let fun inc x = x + 2 in...
Using Files and the Standard Basis
Instead of typing your program into the interactive top-level,
it is more productive to put your program into a file (or set of
files) and then load it (them) into the SML/NJ system. The
simplest way to do this is to use the built-in function use. For example:
-use "myprog.sml";[opening myprog.sml] ... val it = () : unit
The use function
takes the name of the file (of type string) to load. If
the file exists, it is opened and read, with each top-level
declaration in the file processed in turn (and the results
printed on the standard output). The "result" of the use function is the unit value
("()").
As your programs get larger and the code becomes spread over
many modules, you can find it extremely difficult to remember
exactly the right order in which to "use"
the files. In order to alleviate this problem, the SML/NJ system
has a built-in feature called the Compilation Manager, or simply
CM, which is highly recommended for programs involving multiple files. CM
is a complex system with documentation available online.
For most uses the simplest interface is sufficient: simply create
a file in the current directory called sources.cm
which contains the names of all of your SML source files, listed
one per line in any order. Once this file is created, then you
can use the function CM.make
to load, compile, and execute your system. For example, suppose
you have three source files, a.sig,
b.sml, and c.sml. Then you can create a
file called sources.cm
with the following contents:
Group is a.sig b.sml c.sml
Note that it does not matter in what order the file names occur. Once this file has been created, typing the following to the SML/NJ system will do whatever is necessary in order to load your program:
-CM.make();
The CM.make function
will scan all of your sources files and calculate the
dependencies among them so as to compile and load them in the
right order. If CM.make
has already been used before to compile and load your program,
then it looks to see what files have been changed since the last
"make", and then loads and compiles the minimal number
of files necessary in order to bring the system up-to-date. After
running CM.make, you
might notice a new directory in your source file directory. This
new directory is used by CM to "remember" the results
of the dependency calculation, as well as to store the results of
compiling your files so that they don't have to be compiled again
(unless, of course, they have been changed).
There is an extensive set of predefined values and functions in the SML/NJ system. This is referred to as the standard basis, or sometimes the pervasive environment. As with CM, extensive documentation for the standard basis is available online. (A book on the standard basis will be published someday.) For dealing with files, the following function is often useful:
OS.FileSys.chDir : string -> unit
This function implements the standard "cd" UNIX command, which changes the current working directory to the directory specified in the string argument. This is useful if you have started the SML/NJ system in a directory different from the one containing your source files.
Another set of basis functions are useful for controlling the output produced by the SML/NJ system:
Compiler.Control.Print.printDepth : int ref Compiler.Control.Print.printLength : int ref
These variables control the maximum depth and length to which
lists, tuples, and other data structures are to be printed. When
a data structure is deeper than printDepth
or longer than printLength,
the remaining portion of the structure is printed as an ellipse
("...").
To change the value of one of these variables, an assignment can be used. For example:
-Compiler.Control.Print.printDepth := 10;
changes the maximum print depth to ten.
The standard basis contains many modules and functions for manipulating values of all of the basic types, including booleans, integers, reals, characters, strings, arrays, and lists. Unfortunately, the SML/NJ system does not provide any kind of browser, so either you need to refer to the written documentation for the standard basis, or use a little bit of a hack in order to see the complete set of basis functions currently supplied in the SML/NJ for these types. For example, type the following to the interactive top-level:
-signature S = INTEGER;
Each set of standard basis functions is encapsulated in an SML
module, and each such module has a signature, or
"interface", whose name is written entirely in
uppercase and refers to the type of values for which the module
provides functionality. (Note that SML is case sensitive.) For
the integer functions, the signature is called INTEGER. So, the above
declaration simply binds the identifier S
to the signature INTEGER, which causes the
SML/NJ system to respond with a listing of the entire INTEGER interface. (We
could have used any name besides S.)
Other useful signatures include BOOL,
REAL,
CHAR,
STRING,
ARRAY, and
LIST. For functions that
interface to the operating system (such as OS.FileSys.chDir
above), see the signature OS (and
POSIX, if provided). There
are many many other useful modules in the standard basis as well.
Editing Files Using Emacs
I recommend using Emacs to edit your SML programs and also to manage interaction with the SML/NJ system. To do this, you should incorporate the "sml mode" into your emacs startup file. For example, on turing you can simply add the line
(load "/usr/local/sml/sml-mode/sml-mode-setup")
to the .emacs file in your home directory; then the next time
you start Emacs, sml-mode will be present. (If you have other SML-related commands
already in your .emacs file you may want to comment them out, at
least at first.)
With sml-mode, a special editing mode will be invoked any time you edit a file
with an appropriate extension (such as ".sml"). As in
other special editing modes, using the Tab key or Control-j will cause emacs
to attempt to indent your code in a pleasing way; Control-c followed by Tab
will indent the current region. Since SML's syntax is rather complex, the sml-mode
indentation can be rather haphazard at times. Still, many people find it to
be quite useful.
To run SML/NJ inside Emacs, type M-x run-sml (that is, the Meta
key and x, followed by the word run-SML). This will
start up the SML/NJ system inside an Emacs buffer. There are several useful
emacs commands for interacting with the inferior SML shell. You can find documentation
for them by hitting Control-h m. Some of the most basic commands are
C-c C-b |
send the contents of the current buffer to SML ("use" the file; note that the cursor has to be in the buffer containing the ML code and not the buffer where SML is running) |
C-c C-r |
send the code in the current emacs selection (region) to the SML shell |
C-c C-s |
put the cursor in the SML shell |
When the cursor is in the buffer containing the running SML process, the following keys are very useful:
M-p |
Scroll back to the previous command entered at the SML prompt (can be hit repeatedly) |
M-n |
Scroll forward to the next command entered at the SML prompt
(opposite of M-p) |
Making Sense of Error Messages
As with most compilers, the SML/NJ system often produces error messages that can be hard to decipher. The problem is compounded by the fact that SML supports polymorphic type inference, which makes it very difficult for the compiler to figure out precisely the real source of a type error. On the other hand, once all of the compile-time type errors are removed, it is often the case that the bulk of the bugs have already been stamped out. In practice, SML programs often work the first time, once all of the type errors reported by the compiler have been removed!
Type mismatches
The most common kind of error is the simple type mismatch. For
example, suppose we have the following code in a file called myprog.sml:
fun inc x = x + 1 fun f n = inc true
Notice that a semicolon is not needed here, since the end-of-file marker will serve the same purpose. Now, if we load this file, we get the following error message:
use "myprog.sml";![]()
myprog.sml:2.11-2.19 Error: operator and operand don't agree [tycon mismatch] operator domain: int operand: bool in expression: inc true
The error message indicates that the expression inc true, beginning on line 2, column 11 and
ending at line 2, column 19, contains a type error. The function inc is being applied to the argument true
of type bool, but the domain (argument
type) of the inc function is int.
If you are using SML-mode in Emacs, then typing C-c C-l in an
edit buffer containing the program would cause the SML/NJ system to load the
file, and then typing Cc ` would move the edit cursor to the exact
point in the program corresponding to this error message.
The value restriction
One of the most fundamental changes in the 1997 revision of the SML language is that it now enforces something called the value restriction. Essentially, this restricts polymorphism to expressions that clearly are values, specifically single identifiers and functions. When this restriction is violated, the error message, "nongeneric type variable," is given. For example, the following program results in this error:
fun id x = x fun map f nil = nil | map f (h::t) = (f h) :: (map f t) val f = map id
The message given is
myprog.sml:6.1-6.15 Warning: type vars not generalized because of value restriction are instantiated to dummy types (X1,X2,...) val f : ?.X1 list -> ?.X1 list
which indicates that the expression map id would be polymorphic, except that it is not
syntactically a value and so it can't be given the polymorphic type
'a list -> 'a list. It could be correctly be given
any of the types
int list -> int list, or
(string*string) list -> (string*string) list, or
any other of an infinite number of non-polymorphic types, but there's
no way to know which one the user meant; therefore the SML/NJ compiler
plugs in a "dummy" type (which it prints here as
?.X1) that you definitely didn't mean
to force you to go back and specify the type you did.
The reasons for the value restriction are beyond the scope of this document, but are explained in several papers as well as most textbooks on Standard ML.
Syntax errors
Because the syntax of SML is rather complex, there are several common errors that novices tend to make. One of the most common has to do with the syntax of patterns in clausal-form function declarations and case expressions. Consider the following code:
datatype 'a btree = Leaf of 'a
| Node of 'a btree * 'a btree
fun preorder Leaf(v) = [v] | preorder Node(l,r) = preorder l @ preorder r
The SML/NJ system complains vigorously over this:
myprog.sml:4.5-5.49 Error: data constructor Leaf used without argument in pattern
myprog.sml:4.5-5.49 Error: data constructor Node used without argument in pattern
myprog.sml:4.1-5.49 Error: pattern and expression in val rec dec don't agree (tycon mismatch)
expression: 'Z * 'Z -> ('Z * 'Z) list
result type: ('Z * 'Z) list
in declaration:
preorder = (fn arg => (fn <pat> => <exp>))
The problem here is that Leaf and Node are patterns that are syntactically
separate from, respectively, the (v) and (l,r) patterns. The (admittedly strange)
syntax of SML requires extra parenthesization around arguments defined with
fun:
fun preorder (Leaf v) = [v] | preorder (Node(l,r)) = preorder l @ preorder r
Another rather confusing part of the syntax has to do with the interaction between case expressions, exception handlers, and clausal-form function declarations. Consider the following function, taken in slightly modified form from the SML/NJ library (described more below):
datatype 'a option = NONE | SOME of 'a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l (* XXX *) in filterP (l, []) end
In this example, the local function filterP
is defined in two clauses, the first handling the case of a non-empty list argument,
and the second handling the empty list. In the first clause, a case expression
is used. The syntactic ambiguity arises from the fact that it takes too much
“lookahead” to figure out whether or not the second clause of filterP,
marked above with XXX, is actually a continuation of the case
expression. This leads to the following rather cryptic error message:
myprog.sml:8.24 Error: syntax error: replacing EQUALOP with DARROW
As before, parenthesization fixes the problem:
fun filter pred l = let fun filterP (x::r, l) = (case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l)) | filterP ([], l) = rev l in filterP (l, []) end
Alternatively, in this example we can also exchange the two
clauses of filterP:
fun filter pred l = let fun filterP ([], l) = rev l | filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) in filterP (l, []) end
As with many programming languages, the basic advice to follow is: When in doubt, parenthesize.
Exporting Heaps
The SML language encourages modularity, and in practice separate modules tend to be placed into separate files. While this is useful during development, it becomes highly inconvenient when you finally "ship" your finished program to your users. The standard way to ship a program, then, is to save an image of the system heap after all of your files have been loaded. This is referred to as "exporting" the heap, and results in a single file that contains the state of your SML world at the time you performed the export operation.
You can export a heap with the function exportML.
For example, to save the heap image in a file called mysml, the following should be
typed to the SML/NJ prompt:
-SMLofNJ.exportML "mysml";
This will save the current state of the SML/NJ system into the file mysml.
This can then be executed later by running the SML system with the command-line
option, "@SMLload=mysml". This will restart the SML/NJ
system at the same point in which the exportML took place. (Note
that exportML is not supported for the Macintosh System 7 version.)
There is also a function called exportFn,
which saves an SML state as a function that takes in the shell
command-line arguments when restarted. The functionality of exportFn is
SMLofNJ.exportFn : string * (string * string list -> OS_Process.status) -> unit
The first argument is the name of the file to contain the
exported heap image. The second argument is a function that takes
the command line and command line arguments (as strings) and
returns a process-status value (usually OS_Process.success
or OS_Process.failure).
Tools
In addition to the standard basis, the SML/NJ system comes
with several tools and libraries. The ml-lex and ml-yacc programs
perform automatic generation of lexical analyzers and LALR(1)
parsers, respectively. Documentation for these
and other useful tools can be found at the SML/NJ documentation page.
In addition to the Standard
Basis Library, a library of functions automatically available with nearly
every SML implementation, there is an SML/NJ
Library has its own specific library of data structures and functions whose
code on turing can be found in the directory /usr/local/sml/110/src/smlnj-lib/Util.
Finally, extensions to SML for concurrency and interaction with the X window system are supported by the Concurrent ML and eXene extensions to SML.


