CS 105

fgrep Specification

Although our implementation will be simpler (and slower) than the version of fgrep that comes with your system, it will do its job entirely correctly within its limits.

IMPORTANT: Your output must match the output of the system fgrep exactly. We will test by mechanically comparing the two implementations. You can use the (included) runtests script to make sure they match.

Invoking (Our) fgrep

If you type man fgrep you'll find that the modern version of fgrep has a lot of switches. We won't be implementing all of those. Instead, our version will be invoked as follows (assuming you call your program fgrep):

./fgrep [-i] [-l | -n | -q] pattern [file | [file2 ... filen] ]

Here, the square brackets indicate optional arguments, so the minimal invocation requires that only a pattern be given. The switches have the following meanings:

-i Ignore case when matching strings.
-l Instead of printing matching lines, print the name of the file containing the match.
-n When printing a match, prefix it with the line number.
-q Quiet mode: don't print any matches, and don't complain about files that can't be opened. Instead, simply return a success/failure status. Exits after first match in any file.

You aren't required to implement all of the switches (see Grading).

Input

Standard In (stdin)

Shell Expansion

Remember that the shell will process the command-line arguments before they get to the program. So when you type a command line like

fgrep test < test1.txt

the shell processes and “swallows” the input redirection character (<) and the filename that follows it, and supplies the contents of test1.txt to the command on stdin.

The program will only see two arguments: fgrep (the name it was invoked by) and test (the pattern string). The lack of an (obvious) file argument is how your filter will know that it should read from the standard input device, stdin (which the shell has already opened).

If there are no input files given on the command line, fgrep searches the standard input for the given pattern, which could be the contents of a shell redirect or just reading from stdin (i.e., you type something and hit Return and the program processes that line of input).

File Contents

If one or more file arguments are provided, fgrep ignores stdin and searches each of the given files separately.

Errors

If there are usage errors (e.g., no pattern given or an illegal (undefined) switch), fgrep should print a usage message similar to the table shown above (preceded by the string Usage: and a single space character) and then terminate without doing a search.

If file names are given but the file can't be accessed, or if an I/O error occurs (unlikely, and hard to test) then fgrep should print an appropriate error message but should continue its search on any other files that have been specified.

Notice that several of the switches conflict with one another: -q makes both -l and -n pointless, and -l makes -n pointless.

Don't generate an error message for these conflicts; instead, just ignore both -l and -n if the user also specifies -q, and ignore -n if the user also supplies -l.

Output Format(s)

The primary purpose of fgrep is to print lines that contain the specified pattern. But the output format is designed to be human-friendly in the general case, which means that it varies depending on how fgrep was invoked. In particular,

  • If there is only one input file, or if fgrep is reading from stdin, the matched lines are printed verbatim, with no annotation (unless the -n switch is present).
  • If there are multiple input files, each line is processed separately and matched lines are prefixed by the file name, and a colon.
  • If the -n switch is given, the matched lines are prefixed by the line number within the file, counting from 1. If there are multiple files, the file name comes first, followed by the line number, followed by the matched line.
  • If the -l switch is given, fgrep prints only the name(s) of the file(s) containing a match. In this case the -n switch is ignored.
  • If there are multiple matches on a line, it should only be printed once. Similarly, if the-l switch is given and there are multiple matches in a file, the file name should only be printed once. (A good implementation will stop the search early in these cases.)

(See below for examples.)

Exit Status

If there are no errors and any matches are found anywhere, fgrep returns a zero (success) exit status.

If there are no matches, there is any error (e.g., an I/O error; a file-not-found error), or if fgrep is invoked incorrectly (e.g., with a --watusi option), fgrep returns nonzero (failure—see below).

Exceptions

If the -q switch is used, any number of successful matches will yield an exit status of 0, even if there are errors.

Additional Exit Status Values

We suggest that your fgrep should return

  • 1 if there is an error or no match
  • 2 if there are usage errors

(Note that the “official” version of fgrep uses a slightly different convention for exit codes.)

Sample Invocations

Here are some examples of invocations and outputs. We have three text files:

test1.txt

This is a test file.
We really like to test our tests.
It's important to have a line ending to test
Test the line beginning.

test2.txt

I like spam.  But spam doesn't like me.
Our spam is very tasty.

test3.txt

This tests whether we can match at the end when there is no newline

where the line is not ended with a newline character.

Here are some samples of running fgrep on those two test files. Note that echo $? causes the shell to print the exit status of the last command as a single number on a separate line; thus we can see the success or failure status of each invocation.

Successful Run with a Single Filename Argument

./fgrep test test1.txt; echo $?
This is a test file.
We really like to test our tests.
It's important to have a line ending to test
0

Matching lines printed; exit code 0.

Failed Run with a Single Filename Argument

./fgrep test test2.txt; echo $?
1

No match; exit code 1.

Multiple File Arguments

./fgrep test test1.txt test2.txt; echo $?
test1.txt:This is a test file.
test1.txt:We really like to test our tests.
test1.txt:It's important to have a line ending to test
0

Successful matches in one file; note prepended filename. Exit code 0.

Input Taken from stdin; -i Flag

./fgrep -i test < test1.txt; echo $?
This is a test file.
We really like to test our tests.
It's important to have a line ending to test
Test the line beginning.
0

-i flag ignores case of text in files; successful matches printed; exit code 0.

Using the -n Flag

./fgrep -n -i our test1.txt test2.txt; echo $?
test1.txt:2:We really like to test our tests.
test2.txt:2:Our spam is very tasty.
0

Case ignored (-i); line numbering added with the -n flag; successful matches in two files; line number (and colon) follows filename; exit code 0.

The -q Switch; Successful Match

./fgrep -q -n -i our test1.txt test2.txt; echo $?
0

Case ignored (-i); -n flag is ignored; at least one match found in at least one file (we know that there is one match for our in each file from other runs). No output; exit code 0.

-q Switch; Different Search String

./fgrep -q -n -i spam test1.txt test2.txt; echo $?
0

As previous example, but with different search string (spam). At least one match found; no output; exit code 0.

-q Switch; Unmatched Search String

./fgrep -q -n -i chocolate test1.txt test2.txt; echo $?
1

New search string chocolate, which isn't in either test file. Match fails; no output; exit code 1.

-q Flag; Missing File

./fgrep -q -i spam test1.txt test2.txt missing.txt; echo $?
0

Searching in three files, one of which does not exist (missing.txt). At least one match found in at least one file. No output; missing file ignored; exit code 0.

Missing File without -q Flag

./fgrep like test1.txt missing.txt test2.txt; echo $?
test1.txt:We really like to test our tests.
Couldn't open 'missing.txt': No such file or directory
test2.txt:I like spam.  But spam doesn't like me.
1

Without -q flag, we see matches are found and printed; this time, missing file is noticed and error message printed where successful results would have appeared.

Despite successful matches, error message results in exit code 1.

The -n Flag

./fgrep -n test test1.txt; echo $?
1:This is a test file.
2:We really like to test our tests.
3:It's important to have a line ending to test
0

The -n flag gives us line numbers for matches, but doesn't include the filename when searching a single file. Exit code 0.

The -l Flag

./fgrep -l like test1.txt test2.txt; echo $?
test1.txt
test2.txt
0

The -l flag tells fgrep to only print the filename(s) with successful matches. Successful matches; exit code 0.

The -l Flag and stdin

./fgrep -l test < test1.txt; echo $?
(standard input)
0

Input coming from stdin via shell expansion, so fgrep doesn't have any filenames to report; instead it uses (standard input) as the “filename” when a match is found.

Successful match; exit code 0.

Improper Usage

./fgrep -l -v test < test1.txt; echo $?
Usage: fgrep [-i] [-l] [-n] [-q] pattern [files]
2

Using an unsupported flag (-v) results in fgrep printing the usage message and exiting with code 2. Note that no attempt at matching was performed; the bad flag was caught during option processing.

Argument Processing

In the C language, the main program is given two arguments: argc and argv.

There is actually a third argument, envp, but we can get away with ignoring it in almost all programs.

argc is an integer equal to the number of arguments that were given on the command line; note that the command name itself is always the first argument so argc is always nonzero.

The arguments themselves are C-style strings stored in the array argv. Thus argv[0] is the name of the program (which you can verify in gdb by typing p argv[0] if you're stopped at a breakpoint in main).

Note that assuming an argument beginning with a dash necessarily implies that the pattern you're searching for cannot begin with a dash.

The standard version of fgrep has a solution to that problem, but we'll solve it by ignoring it.

Similarly, argv[1] is the first argument expressed as a string, and, since a C string is an array, argv[1][0] is the first character of the first argument. That can be very useful because if that character is a dash, the argument must be a switch.

As mentioned in the Overview of a Unix Filter section in the lab intro, switches can appear in any order. Thus it is incorrect to write code that assumes that switches will appear in any particular order, such as

if (strcmp(argv[1], "-q") == 0) /* Process -q switch */
if (strcmp(argv[2], "-l") == 0) /* Process -l switch */

That way lies madness. Instead, you should loop over the arguments, using an if/else if construct, a switch statement, or a combination of the two to figure out whether the argument is a switch or the pattern.

For each switch, you should have a corresponding integer variable that is zero if the switch is not present, or nonzero if it appears. Your switch-processing code can then simply set the appropriate variable to 1.

Because nearly every program needs to process arguments and switches, there have been many attempts to write library functions that can simplify the task. For C, two of the most popular are getopt and getopt_long, both of which have extensive manual pages.

You are welcome to investigate either or both of these options, or to simply write your own argument-processing loop.

No matter how you choose to approach argument processing, when you reach a non-switch argument you should assume that it is the pattern and break out of the loop. Any arguments after the pattern will be (expected to be) the names of files to process.

Handling Files

If there are no file arguments provided on the command line, fgrep should read from standard input (stdin), which is already open by default and is of type FILE *. But if there are file names on the command line, you will need to open each file in turn using fopen (which returns a FILE *) and search through that file. Thus, it's best to have a helper function (in our sample solution, we brilliantly called ours fgrep) that accepts a FILE * and some other arguments, and does the real work.

The best way to deal with the input file is to read it one line at a time, check for the presence of the search string, and then continue to the next line.

You can use fgets to do just that: it allows you to read from an arbitrary FILE * and protects you from buffer-overflow attacks.

Keep in mind, however, that you also need to handle arbitrarily long input lines, so you can't just declare a 1000-byte buffer and assume that all lines will be shorter than that. Instead, you should

  1. Allocate a small buffer and use fgets to read into it.
  2. If the result doesn't end in a newline, your buffer was too small.
    • Expand your buffer by doubling its size.
    • Read again, appending the new data to the existing buffer.
    • Continue until the buffer is big enough to hold the whole line.
  3. Do your search on the contents of the buffer.

(You'll probably want to create some helper functions.)

A few notes on this approach:

  • In C, you allocate memory using malloc, which accepts a single integer argument that is the amount of memory, in bytes, that you want. malloc returns a pointer to the desired memory, or NULL if you are out of memory.
  • You can expand previously allocated memory with realloc. The expanded memory might be in a different place (so you will need to update the pointer to it), but it is guaranteed to contain the data that was in the original. See the man page for realloc for syntax.
  • Once you have expanded the buffer, you should keep it at the expanded size. There is no point in freeing it and starting over.
  • To make sure your expansion code works, we recommend that you start with a ridiculously tiny buffer (our sample solution used 2 bytes; a single byte is too small). Starting with a small buffer will exercise your expansion code and quickly reveal any bugs.
  • Be careful about memory leaks! You can free memory with free.

(When logged in, completion status appears here.)