Lab 6: fgrep
In this lab, you'll be writing the C code for fgrep, a simple
command that searches one or more files for a given string.
Recall that in the second I/O lecture you saw simple programs that each acted as a Unix filter, reading from standard input and writing to standard output.
In this lab you'll create a version of the Unix utility fgrep that also offers a number of other features that are common in well-written Unix filters, such as
- “Switches” (options) on the command line can be used to modify the program's behavior in useful ways.
- Input can come from
stdinor from a file. - Multiple files can be processed in a single invocation.
- There is no built-in limit on the size of the file being processed, nor on the length of lines in that file.
Overview of a Unix Filter
A “proper” Unix filter program has a number of common characteristics:
- By default, it reads from standard input (
stdin) and writes to standard output (stdout). - It isn't “chatty”: it does its job without progress reports. (By default.)
- One or more file names can be given on the command line, in which
case it reads those files rather than
stdin. - Switches (options) are introduced by a single dash (
-) followed by a single character, or, alternatively, by two dashes (--) followed by a longer name. - Switches can appear in any order.
- Switches always precede file names and other arguments. Switches and arguments can't be intermixed.
- The exit status indicates whether the filter succeeded (using a filter-specific definition of “success”).
- If the filter is invoked incorrectly, it prints a “usage” message that briefly summarizes the correct invocation.
- Errors (including the usage message) are reported to
stderr. - A separate manual page thoroughly documents the program. The documentation is explicitly not part of the program itself.
For this lab, we'll ignore the man page requirement, but implement
the rest correctly.
Grading
The lab is worth 110 points, scored as follows:
- Basic Functionality
- 40 points for being able to find (and not find) strings in a single file, correctly handling files that do not end in a newline, and generating a correct exit status.
- Switches
- 5 points for each of the four switches you're asked to implement (i.e., 20 points if you implement all four).
- Standard Input and Multiple Files
- 15 points for correctly handling multiple files or no files on the command line (including correct line prefixes).
- Long Lines
- 10 points for handling lines of arbitrary length.
- Switch Ordering
- 5 points for accepting arbitrary switch orders (note that the
library functions
getoptandgetopt_longdo this for you---easy points!). - Missing Files
- 5 points for correctly handling missing files, including producing the proper exit code.
- Miscellaneous
- 5 points for generating correct usage messages.
- Coding Style
- 10 points for writing clean, well-structured, well-commented code that would make any CS 70 grutor proud.
Steps
fgrep, Then and Now
The fgrep command, where f stands for “fixed” (or “fast”),
searches through text looking for a fixed string rather than a
regular expression.
On a modern system, “fgrep” is often a link to the grep
executable, or a shell script that calls grep with the -F flag
and passes your arguments to it.
On the CS servers (running Gentoo Linux), fgrep is a script that
looks like
#!/usr/bin/env sh
exec "/bin/grep" -F "$@"
On a Fedora system, it's also a script:
#!/usr/bin/sh
cmd=${0##*/}
echo "$cmd: warning: $cmd is obsolescent; using grep -F" >&2
exec grep -F "$@"
On macOS, file fgrep will tell you that it's a “Mach-O universal
binary”, but it's actually a hard link to the grep binary (as are
several other grep variants, including egrep, rgrep, bzgrep,
bzegrep, bzfgrep, zgrep, zegrep, and zfgrep). In this
case, the actual grep program changes its behavior to match that
of the name it's invoked by.
(When logged in, completion status appears here.)