fgrep Specification
Although our implementation will be simpler (and slower) than the
version of fgrep that may come with your system, it will do its job
correctly within its limits.
IMPORTANT: Your output must match the output of the system
fgrep exactly. We will test your code by mechanically comparing
your fgrep to the system fgrep. (You can (and should!) use the
(included) runtests script to make sure they match.)
Invoking (Our) fgrep
You might also find that the
man page for fgrep is the man page for grep, or that there's
no man page at all, as fgrep is considered to be obsolete. (Use grep -F!)
If you type man fgrep you may find that the modern version of
fgrep has a lot of switches, often because it's really just grep
with a different name. We won't be implementing all of those
options. Instead, our version will be invoked as follows:
./fgrep [-i] [-l | -n | -q] pattern [file | [file2 ... filen ] ]
Here, the square brackets indicate optional arguments, so the minimal invocation requires that only a pattern be given. The switches have the following meanings:
-i |
Ignore case when matching strings. |
-l |
Instead of printing matching lines, print the name of the file containing the match. |
-n |
When printing a match, prefix it with the line number. |
-q |
Quiet mode: don't print any matches, and don't complain about files that can't be opened. Instead, simply return a success/failure status. Exits after first match in any file. |
Input
Standard In (stdin)
Shell Expansion
Remember that the shell will process the command-line arguments before they get to the program. So when you type a command line like
./fgrep test < test1.txt
the shell processes and “swallows” the input redirection character
(<) and the filename that follows it, and supplies the contents of
test1.txt to the command on stdin.
The program will only see two arguments: ./fgrep (the name it was
invoked by) and test (the pattern string). The lack of an
(obvious) file argument is how your filter will know that it should
read from the standard input device, stdin (which the shell has
already opened).
If there are no input files given on the command line, fgrep
searches the standard input for the given pattern, which could be
the contents of a shell redirect or just reading from stdin (i.e.,
you type something and hit Return and the
program processes that line of input).
File Contents
If one or more arguments are provided, fgrep ignores
stdin and searches each of the given files separately.
Errors
If there are usage errors (e.g., no pattern given or an illegal
(undefined) switch), fgrep should print a usage message similar to
the table shown above (preceded by the string Usage: and a single
space character) and then terminate without doing a search.
If file names are given but the file can't be accessed, or if an I/O
error occurs (unlikely, and hard to test) then fgrep should print an
appropriate error message but should continue its search on any other
files that have been specified.
Notice that several of the switches conflict with one another: -q
makes both -l and -n pointless, and -l makes -n pointless.
Don't generate an error message for these conflicts; instead, just
ignore both -l and -n if the user also specifies -q, and
ignore -n if the user also supplies -l.
Output Format(s)
The primary purpose of fgrep is to print lines that contain the
specified pattern. But the output format is designed to be
human-friendly in the general case, which means that it varies
depending on how fgrep was invoked. In particular,
- If there is only one input file, or if
fgrepis reading fromstdin, the matched lines are printed verbatim, with no annotation (unless the-nswitch is present). - If there are multiple input files, each line is processed separately and matched lines are prefixed by the file name, and a colon.
- If the
-nswitch is given, the matched lines are prefixed by the line number within the file, counting from 1. If there are multiple files, the file name comes first, followed by the line number, followed by the matched line. - If the
-lswitch is given,fgrepprints only the name(s) of the file(s) containing a match. In this case the-nswitch is ignored. - If there are multiple matches on a line, it should only be printed
once. Similarly, if the
-lswitch is given and there are multiple matches in a file, the file name should only be printed once. (A good implementation will stop the search early in these cases.)
Both the -l and -q options cause fgrep to stop processing a file as soon as a match is found, so they will be more efficient than the other options when there are many matches in a file.
(See below for examples.)
Exit Status
If there are no errors and any matches are found anywhere, fgrep
returns a zero (success) exit status.
If there are no matches, there is any error (e.g., an I/O error; a
file-not-found error), or if fgrep is invoked incorrectly (e.g.,
with a --watusi option), fgrep returns nonzero (failure—see below).
Exceptions
If the -q switch is used, any number of successful matches
will yield an exit status of 0, even if there are errors.
Additional Exit Status Values
We suggest that your fgrep should return
1if there is an error or no match2if there are usage errors
(Note that the “official” version of fgrep uses a slightly
different convention for exit codes.)
Sample Invocations
Here are some examples of invocations and outputs. We have three text files:
test1.txt
This is a test file.
We really like to test our tests.
It's important to have a line ending to test
Test the line beginning.
test2.txt
I like spam. But spam doesn't like me.
Our spam is very tasty.
test3.txt
This tests whether we can match at the end when there is no newline
where the line is not ended with a newline character.
Here are some samples of running fgrep on those two test files.
Note that echo $? causes the shell to print the exit status of
the last command as a single number on a separate line; thus we can
see the success or failure status of each invocation.
Successful Run with a Single Filename Argument
./fgrep test test1.txt; echo $?
This is a test file.
We really like to test our tests.
It's important to have a line ending to test
0
Matching lines printed; exit code 0.
Failed Run with a Single Filename Argument
./fgrep test test2.txt; echo $?
1
No match; exit code 1.
Multiple File Arguments
./fgrep test test1.txt test2.txt; echo $?
test1.txt:This is a test file.
test1.txt:We really like to test our tests.
test1.txt:It's important to have a line ending to test
0
Successful matches in one file; note prepended filename. Exit code 0.
Input Taken from stdin; -i Flag
./fgrep -i test < test1.txt; echo $?
This is a test file.
We really like to test our tests.
It's important to have a line ending to test
Test the line beginning.
0
-i flag ignores case of text in files; successful matches printed; exit code 0.
Using the -n Flag
./fgrep -n -i our test1.txt test2.txt; echo $?
test1.txt:2:We really like to test our tests.
test2.txt:2:Our spam is very tasty.
0
Case ignored (-i); line numbering added with the -n flag; successful matches in two files; line number (and colon) follows filename; exit code 0.
The -q Switch; Successful Match
./fgrep -q -n -i our test1.txt test2.txt; echo $?
0
Case ignored (-i); -n flag is ignored; at least one match found in at least one file (we know that there is one match for our in each file from other runs). No output; exit code 0.
-q Switch; Different Search String
./fgrep -q -n -i spam test1.txt test2.txt; echo $?
0
As previous example, but with different search string (spam). At
least one match found; no output; exit code 0.
-q Switch; Unmatched Search String
./fgrep -q -n -i chocolate test1.txt test2.txt; echo $?
1
New search string chocolate, which isn't in either test file.
Match fails; no output; exit code 1.
-q Flag; Missing File
./fgrep -q -i spam test1.txt test2.txt missing.txt; echo $?
0
Searching in three files, one of which does not exist (missing.txt). At least one match found in at least one file. No output; missing file ignored; exit code 0.
Missing File without -q Flag
./fgrep like test1.txt missing.txt test2.txt; echo $?
test1.txt:We really like to test our tests.
Couldn't open 'missing.txt': No such file or directory
test2.txt:I like spam. But spam doesn't like me.
1
Without -q flag, we see matches are found and printed; this time,
missing file is noticed and error message printed where successful
results would have appeared.
Despite successful matches, error message results in exit code 1.
The -n Flag
./fgrep -n test test1.txt; echo $?
1:This is a test file.
2:We really like to test our tests.
3:It's important to have a line ending to test
0
The -n flag gives us line numbers for matches, but doesn't include the filename when searching a single file. Exit code 0.
The -l Flag
./fgrep -l like test1.txt test2.txt; echo $?
test1.txt
test2.txt
0
The -l flag tells fgrep to only print the filename(s) with successful matches.
Successful matches; exit code 0.
The -l Flag and stdin
./fgrep -l test < test1.txt; echo $?
(standard input)
0
Input coming from stdin via shell expansion, so fgrep doesn't
have any filenames to report; instead it uses (standard input) as
the “filename” when a match is found.
Successful match; exit code 0.
Improper Usage
./fgrep -l -v test < test1.txt; echo $?
Usage: fgrep [-i] [-l] [-n] [-q] pattern [files]
2
Using an unsupported flag (-v) results in fgrep printing the
usage message and exiting with code 2. Note that no attempt at
matching was performed; the bad flag was caught during option
processing.
Argument Processing
In the C language, the main program is given two arguments: argc
and argv.
envp, but we can get away with ignoring it in almost all
programs.argc is an integer equal to the number of arguments that were
given on the command line; note that the command name itself is
always the first argument so argc is always nonzero.
The arguments themselves are C-style strings stored in the array
argv. Thus argv[0] is the name of the program (which you can
verify in gdb by typing p argv[0] if you're stopped at a
breakpoint in main).
Note that assuming an argument beginning with a dash necessarily implies that the pattern you're searching for cannot begin with a dash.
The standard version of fgrep has a solution to that problem, but
we'll “solve” the issue by ignoring it.
Similarly, argv[1] is the first argument expressed as a string,
and, since a C string is an array, argv[1][0] is the first character
of the first argument. That can be very useful because if that
character is a dash, the argument must be a switch.
As mentioned in the Overview of a Unix Filter section in the lab intro, switches can appear in any order. Thus it is incorrect to write code that assumes that switches will appear in any particular order, such as
if (strcmp(argv[1], "-q") == 0) /* Process -q switch */
if (strcmp(argv[2], "-l") == 0) /* Process -l switch */
That way lies madness. Instead, you should loop over the arguments,
using an if/else if construct, a switch statement, or a
combination of the two to figure out whether the argument is a
switch or the pattern.
For each switch, you should have a corresponding integer variable that is zero if the switch is not present, or nonzero if it appears. Your switch-processing code can then simply set the appropriate variable to 1.
Because nearly every program needs to process arguments and
switches, there have been many attempts to write library functions
that can simplify the task. For C, two of the most popular are
getopt and getopt_long, both of which have extensive manual
pages.
You are welcome to investigate either or both of these options (getopt is used in the count.c example), or
you can write your own argument-processing loop, like an animal.
You may assume that the user puts all option flags before the pattern and file names. If they don't do that, it's up to you whether you consider those arguments as options or strangely named files. (The standard version of fgrep allows options and file names to be intermixed, but we won't require that.)
Handling Files
If there are no file arguments provided on the command line, fgrep
should read from standard input (stdin), which is already open by
default and is of type FILE *. But if there are file names on
the command line, you will need to open each file in turn using
fopen (which returns a FILE *) and search through that file.
Thus, it's best to have a helper function (in our sample solution,
we brilliantly called ours fgrep) that accepts a FILE * and some
other arguments, and does the real work.
The best way to deal with the input file is to read it one line at a time, check for the presence of the search string, and then continue to the next line.
The easiest way to read lines from a file is to use the getline function, which is part of the standard C library on a POSIX system. getline takes care of allocating and resizing a buffer as needed, and it returns the length of the line read. The rev.c example in the examples directory shows how to use getline to read lines of arbitrary length. You can refer to that code as you work on your fgrep implementation. You can also read the manual page for getline by running man 3 getline to see how it works.
Once you have the line read, you can use the strstr function to check if the pattern is present in the line. The contains.c example in the examples directory shows how to use strstr to check for a substring within a string. You can refer to that code as you work on your fgrep implementation, and you can also read the manual page for strstr by running man 3 strstr to see how it works.
Unfortunately, when it comes to ignoring case, there isn't a standard library function that does that for you (actually, there is a function called strcasestr, but it's not part of the standard C library or POSIX specification, so we can't rely on it in portable code). There is a function strcasecmp that compares two strings while ignoring case. You can loop over the line using this function to check for the presence of the pattern while ignoring case. You can read the manual page for strcasecmp by running man 3 strcasecmp to see how it works.
(When logged in, completion status appears here.)