CS 105

Lab 6: fgrep

In this lab, you'll be writing the C code for fgrep, a simple command that searches one or more files for a given string.

Recall that in the second I/O lecture you saw simple programs that each acted as a Unix filter, reading from standard input and writing to standard output.

In this lab you'll create a version of the Unix utility fgrep that also offers a number of other features that are common in well-written Unix filters, such as

  • “Switches” (options) on the command line can be used to modify the program's behavior in useful ways.
  • Input can come from stdin or from a file.
  • Multiple files can be processed in a single invocation.
  • There is no built-in limit on the size of the file being processed, nor on the length of lines in that file.

Overview of a Unix Filter

A “proper” Unix filter program has a number of common characteristics:

  1. By default, it reads from standard input (stdin) and writes to standard output (stdout).
  2. It isn't “chatty”: it does its job without progress reports. (By default.)
  3. One or more file names can be given on the command line, in which case it reads those files rather than stdin.
  4. Switches (options) are introduced by a single dash (-) followed by a single character, or, alternatively, by two dashes (--) followed by a longer name.
  5. Switches can appear in any order.
  6. Switches always precede file names and other arguments. Switches and arguments can't be intermixed.
  7. The exit status indicates whether the filter succeeded (using a filter-specific definition of “success”).
  8. If the filter is invoked incorrectly, it prints a “usage” message that briefly summarizes the correct invocation.
  9. Errors (including the usage message) are reported to stderr.
  10. A separate manual page thoroughly documents the program. The documentation is explicitly not part of the program itself.

For this lab, we'll ignore the man page requirement, but implement the rest correctly.

Grading

The lab is worth 110 points, scored as follows:

Basic Functionality
40 points for being able to find (and not find) strings in a single file, correctly handling files that do not end in a newline, and generating a correct exit status.
Switches
5 points for each of the four switches you're asked to implement (i.e., 20 points if you implement all four).
Standard Input and Multiple Files
15 points for correctly handling multiple files or no files on the command line (including correct line prefixes).
Long Lines
10 points for handling lines of arbitrary length.
Switch Ordering
5 points for accepting arbitrary switch orders (note that the library functions getopt and getopt_long do this for you---easy points!).
Missing Files
5 points for correctly handling missing files, including producing the proper exit code.
Miscellaneous
5 points for generating correct usage messages.
Coding Style
10 points for writing clean, well-structured, well-commented code that would make any CS 70 grutor proud.

Steps

fgrep, Then and Now

The fgrep command, where f stands for “fixed” (or “fast”), searches through text looking for a fixed string rather than a regular expression.

On a modern system, “fgrep” is often a link to the grep executable, or a shell script that calls grep with the -F flag and passes your arguments to it.

On the CS servers (running Gentoo Linux), fgrep is a script that looks like

#!/usr/bin/env sh
exec "/bin/grep" -F "$@"

On a Fedora system, it's also a script:

#!/usr/bin/sh
cmd=${0##*/}
echo "$cmd: warning: $cmd is obsolescent; using grep -F" >&2
exec grep -F "$@"

On macOS, file fgrep will tell you that it's a “Mach-O universal binary”, but it's actually a hard link to the grep binary (as are several other grep variants, including egrep, rgrep, bzgrep, bzegrep, bzfgrep, zgrep, zegrep, and zfgrep). In this case, the actual grep program changes its behavior to match that of the name it's invoked by.

(When logged in, completion status appears here.)