CS 105

Lab 6: fgrep

fgrep is a simple command that searches one or more files for a given string. In the I/O lecture we saw the “guts” of fgrep,

n = strlen(search_string);
while (1) {
    nbytes = read(fd, buf, sizeof buf);
    for (int i = 0;  i < nbytes - n + 1;  i++) {
        if (strncmp(&buf[i], search_string, n) == 0)
            /* Print line containing search_string */
    }
}

and discussed the fact that this code fails if the search string spans two I/O buffers.

In this lab you'll create a version of fgrep that doesn't have that major bug, and that also offers a number of other features that are common in well-written Unix filters, such as

  • “Switches” (options) on the command line can be used to modify the program's behavior in useful ways.
  • Input can come from stdin or from a file.
  • Multiple files can be processed in a single invocation.
  • There is no built-in limit on the size of the file being processed, nor on the length of lines in that file.

Overview of a Unix Filter

A “proper” Unix filter program has a number of common characteristics:

  1. By default, it reads from standard input (stdin) and writes to standard output (stdout).
  2. It isn't “chatty”: it does its job without progress reports. (By default.)
  3. One or more file names can be given on the command line, in which case it reads those files rather than stdin.
  4. Switches (options) are introduced by a single dash (-) followed by a single character, or, alternatively, by two dashes (--) followed by a longer name.
  5. Switches can appear in any order.
  6. Switches always precede file names and other arguments. Switches and arguments can't be intermixed.
  7. The exit status indicates whether the filter succeeded (using a filter-specific definition of “success”).
  8. If the filter is invoked incorrectly, it prints a “usage” message that briefly summarizes the correct invocation.
  9. Errors (including the usage message) are reported to stderr.
  10. A separate manual page thoroughly documents the program. The documentation is explicitly not part of the program itself.

For this lab, we'll ignore the man page requirement, but implement the rest correctly.

Grading

The lab is worth 100 points, scored as follows:

Basic Functionality
40 points for being able to find (and not find) strings in a single file, correctly handling files that do not end in a newline, and generating a correct exit status.
Switches
5 points for each of the four switches you're asked to implement (i.e., 20 points if you implement all four).
Standard Input and Multiple Files
15 points for correctly handling multiple files or no files on the command line (including correct line prefixes).
Long Lines
10 points for handling lines of arbitrary length.
Switch Ordering
5 points for accepting arbitrary switch orders (note that the library functions getopt and getopt_long do this for you---easy points!).
Missing Files
5 points for correctly handling missing files, including producing the proper exit code.
Miscellaneous
5 points for generating correct usage messages.

Steps

(When logged in, completion status appears here.)