View this page in Romanian courtesy of Jan Rolsky - mp4 converter developer.
View this page in Ukranian courtesy of Maxim Petrenko from the Android developers blog.

How to Create a Good UNIX Interface

Introduction

This document was written in 1994, to introduce new graduate students and research programmers to some basic principles of good Unix command design. With the advent of the Web, I have decided to make it available to others, in hopes that new researchers won't keep making the same simple mistakes.

Basic Principles

  • Make it a filter
  • Return error codes
  • Handle multiple arguments
  • Report errors
  • Make output easy to parse
  • Make the default case be the simplest
  • Don't make it interactive by default
  • Keep options simple
  • Think about how other programs will use it
  • Read Kernighan and Pike
  • Make it a filter

    Much of the power of UNIX comes from the availability of pipes and the associated concept of a filter. Programs that are written to be filters can be composed with other programs in ways that the original author never expected. Certain rules must be followed to make this possible. The most important of these is that all programs should behave (to the extent appropriate) as filters. This means that, if they accept an input file, they should read it from standard input by default. If, like most programs, they produce output, they should put it on standard output, so that it is easy to process in a pipe or in reverse quotes.

    It is fairly common for programs to accept names of input files on the command line (see below). However, only in very rare circumstances should an output file be given on the command line, rather than by redirecting standard output. The most common exception is when there are multiple output files, for example, those produced by split(1). A much more unusual example is sort(1), which needs the -o switch because of the capability of overwriting one of its input files (which can't be done via redirection) -- but note that sort still writes to stdout by default. A final example is ld(1), which needs to know the name of its output file so that it can set the execute mode bits. If you absolutely have to provide a specifiable output file, please use "-o" as the switch, and write to stdout by default.

    A final point about writing filters is that it is critical to write error messages to stderr, rather than stdout. There are two reasons for this. First, it means that error messages will show up to the user, rather than disappearing down a pipe (this is why stderr was invented). Second, it means that programs that need to parse your filter's output won't get confused by error messages that don't follow the standard output format.

    Return error codes

    A program that doesn't return a proper success/failure status is not only sloppy, it is wrong. Even if your program can't possibly fail, it should still return zero, both to prevent compiler warnings and (far more important) so that things like make won't give up because you unintentionally returned a nonzero status.

    If your program can fail, always return a nonzero status on failure. This makes it much more useful in scripts. When it's not too difficult to do so, return codes should be meaningful, so that callers can distinguish different types of errors.

    Handle multiple arguments

    Many programs accept the names of files to be processed in some manner. If your program processes a file, you should ask yourself whether it would be possible for it to process more than one at a time in some meaningful fashion. If so, take the trouble to write a loop across the arguments! (This also applies to non-file arguments for some programs; see the list of examples below.) This does not mean that you should require an argument; in particular, any program that reads input should be able to read from stdin as well as from a file. One easy way to do this is to write code like the following:

    	FILE *infile;
    	int i;
    	int error_status = 0;
    	if (argc <= 1)
    	    dofile (stdin)
    	else {
    	    for (i = 1;  i < argc;  i++) {
    		infile = fopen (argv[i], "r");
    		if (infile != NULL) {
    		    dofile (infile);
    		    fclose (infile);
    		} else {
    		    fprintf (stderr,
    		      "sample-program:  Couldn't open input file '%s': %s\n",
    		      argv[1], strerr (errno));
    		    error_status = 1;
    		}
    	    }
    	}
    	return error_status;
    

    DO NOT take the lazy way out, writing code that processes a single argument and then exits. Consider the following list of programs, all of which could have been written to take only a single argument (or even to read only from stdin), and think about how much more of a pain Unix would be if that had been done: rm, touch, more, sort, mkdir, cc, ls, mail

    Report errors

    As indicated in the sample code above, you should always check for and report errors, and do so in a meaningful fashion. Don't just exit silently. Don't print "no input file" without any other help. It's usually pretty poor practice to use perror(3) without giving any other information; use strerr(3) instead, as in the example.

    (Strerr(3) is not always available. Sometimes it masquerades as strerror(3). Other times, as in SunOS 4.1.1, it's just not there. But that's no excuse. It's easy to implement:

    	extern int errno, sys_nerr;
    	extern char *sys_errlist[];
    	char *strerr (num)
    	    int num;
    	{
    	    static char nomsgbuf[24];
    
    	    if (num == 0)
    		num = errno;
    	    if (num <= 0  ||  num >= sys_nerr) {
    		sprintf (nomsgbuf, "unknown error %d", num);
    		return nomsgbuf;
    	    }
    	    else
    		return sys_errlist[num];
    	}
    

    Try to make error messages as informative as possible. Remember that the cause of the error may be far removed from the symptoms. Also, identify your program in every error message (as in the example). Nothing's so frustrating as getting a message from a 5-program pipeline, and not knowing which program produced it.

    Finally, every program should issue a usage message when invoked incorrectly. You don't have to get fancy, but you should at least give a one-line listing of arguments and options.

    Make output easy to parse

    For a filter to be useful, it must be easy for other programs to parse the output. The Unix standard is that output should consist of single lines containing equal numbers of whitespace-separated fields. Generally, fields should be self-identifying based on their contents, or should be identified by a header line. If it is reasonable for fields to contain whitespace (e.g., the full-name field from the password file), it is acceptable to use a different separator such as a colon, but this should be the exception.

    Multi-line output should be avoided whenever possible, because it is vastly more difficult to parse. Instead, take a cue from ls(1) and ps(1): display the most commonly-needed information by default, and supply switches that can be used to print details if necessary. If you absolutely have to produce multi-line output, every line should begin with a unique tag, so that fields of interest can be selected with a grep(1).

    A simple test for the quality of your output is to pipe your program into "awk '{print $2}'". If you get sensible results, your program is probably easy to use in a more general pipeline.

    Make the default case be the simplest

    When you are whipping up a tool to dump some bit of data for debugging purposes, there is a temptation to print out everything "just in case I ever need to see it." The desire for completeness is admirable, but completeness should not be the default. Overly verbose tools make it hard to pick out the desired output, encourage multiple output lines and unnecessary label fields, and lead to negative command switches ("specify the -d switch if you don't want to see the date").

    Instead, think a bit about what the invoker is most likely to want to see, and make this the default. Ls(1) is a perfect example of this: in the default case, all it displays is the file name. If you're in a hurry, handle only two cases: the default (brief) output, and full output, usually selected by "-l" or "-a". Intermediate switches can be added later if it turns out to be useful to select a subset of the full output.

    Don't make it interactive by default

    As I have said before, the power of Unix is built on filters. With extremely rare exceptions, this means that programs should be sensibly usable from the command line. By default, instructions should be gotten from option switches, not from interactively reading stdin. It is OK, and often even good design, to provide an interactive mode as an option, but the default case should be to perform the actions indicated on the command line and then exit. Interactive-only programs are extremely hard to use in pipelines (and I don't think there's a single interactive Unix program that I have not found occasion to use non-interactively).

    Keep options simple

    As a general rule, each separate behavior of a program should be invoked by a separate option switch. A good example of this is wc(1), which has separate switches to select word, line, or character counts. However, this principle can be carried too far. Consider ls(1), which already has too many options. If every field of the printed by the "-l" switch had its own command-line option, ls would be almost unusable. Try to find a balance between making an option invoke logically-related functions and having too many options.

    Think about how other programs will use it

    Last, but not at all least, always keep in mind that you are building a tool that will be a component of other tools. Ask yourself how easy it is to invoke your program from a script, how can the output be made easier to process, and what functionality might be useful that you don't already have. If you do these things, your program will have a long and happy life.

    Read Kernighan and Pike

    If you're serious about being a Unix programmer, you will be doing yourself a great favor if you purchase and carefully read a copy of "The Unix Programming Environment" by Brian Kernighan and Rob Pike. This book is a great introduction to shell programming and to the methods of fitting tools together that make Unix such a great system for serious computer scientists.


    Back to Geoff Kuenning's home page.

    This page maintained by Geoff Kuenning.