fgrep Specification
Although our implementation will be simpler (and slower) than the
version of fgrep that comes with your system, it will do its job
entirely correctly within its limits.
IMPORTANT: Your output must match the output of the system
fgrep exactly. We will test by mechanically comparing the two
implementations. You can use the (included) runtests script to
make sure they match.
Invoking (Our) fgrep
If you type man fgrep you'll find that the modern version of fgrep has a lot of switches. We won't be implementing all of those. Instead, our version will be invoked as follows (assuming you call your program fgrep):
./fgrep [-i] [-l | -n | -q] pattern [file | [file2 ... filen ] ]
Here, the square brackets indicate optional arguments, so the minimal invocation requires that only a pattern be given. The switches have the following meanings:
-i |
Ignore case when matching strings. |
-l |
Instead of printing matching lines, print the name of the file containing the match. |
-n |
When printing a match, prefix it with the line number. |
-q |
Quiet mode: don't print any matches, and don't complain about files that can't be opened. Instead, simply return a success/failure status. Exits after first match in any file. |
You aren't required to implement all of the switches (see Grading).
Input
Standard In (stdin)
Shell Expansion
Remember that the shell will process the command-line arguments before they get to the program. So when you type a command line like
fgrep test < test1.txt
the shell processes and “swallows” the input redirection character
(<) and the filename that follows it, and supplies the contents of
test1.txt to the command on stdin.
The program will only see two arguments: fgrep (the name it was
invoked by) and test (the pattern string). The lack of an
(obvious) file argument is how your filter will know that it should
read from the standard input device, stdin (which the shell has
already opened).
If there are no input files given on the command line, fgrep
searches the standard input for the given pattern, which could be
the contents of a shell redirect or just reading from stdin (i.e.,
you type something and hit Return and the
program processes that line of input).
File Contents
If one or more arguments are provided, fgrep ignores
stdin and searches each of the given files separately.
Errors
If there are usage errors (e.g., no pattern given or an illegal
(undefined) switch), fgrep should print a usage message similar to
the table shown above (preceded by the string Usage: and a single
space character) and then terminate without doing a search.
If file names are given but the file can't be accessed, or if an I/O
error occurs (unlikely, and hard to test) then fgrep should print an
appropriate error message but should continue its search on any other
files that have been specified.
Notice that several of the switches conflict with one another: -q
makes both -l and -n pointless, and -l makes -n pointless.
Don't generate an error message for these conflicts; instead, just
ignore both -l and -n if the user also specifies -q, and
ignore -n if the user also supplies -l.
Output Format(s)
The primary purpose of fgrep is to print lines that contain the
specified pattern. But the output format is designed to be
human-friendly in the general case, which means that it varies
depending on how fgrep was invoked. In particular,
- If there is only one input file, or if
fgrepis reading fromstdin, the matched lines are printed verbatim, with no annotation (unless the-nswitch is present). - If there are multiple input files, each line is processed separately and matched lines are prefixed by the file name, and a colon.
- If the
-nswitch is given, the matched lines are prefixed by the line number within the file, counting from 1. If there are multiple files, the file name comes first, followed by the line number, followed by the matched line. - If the
-lswitch is given,fgrepprints only the name(s) of the file(s) containing a match. In this case the-nswitch is ignored. - If there are multiple matches on a line, it should only be printed
once. Similarly, if the
-lswitch is given and there are multiple matches in a file, the file name should only be printed once. (A good implementation will stop the search early in these cases.)
(See below for examples.)
Exit Status
If there are no errors and any matches are found anywhere, fgrep
returns a zero (success) exit status.
If there are no matches, there is any error (e.g., an I/O error; a
file-not-found error), or if fgrep is invoked incorrectly (e.g.,
with a --watusi option), fgrep returns nonzero (failure—see below).
Exceptions
If the -q switch is used, any number of successful matches
will yield an exit status of 0, even if there are errors.
Additional Exit Status Values
We suggest that your fgrep should return
1if there is an error or no match2if there are usage errors
(Note that the “official” version of fgrep uses a slightly
different convention for exit codes.)
Sample Invocations
Here are some examples of invocations and outputs. We have three text files:
test1.txt
This is a test file.
We really like to test our tests.
It's important to have a line ending to test
Test the line beginning.
test2.txt
I like spam. But spam doesn't like me.
Our spam is very tasty.
test3.txt
This tests whether we can match at the end when there is no newline
where the line is not ended with a newline character.
Here are some samples of running fgrep on those two test files.
Note that echo $? causes the shell to print the exit status of
the last command as a single number on a separate line; thus we can
see the success or failure status of each invocation.
Successful Run with a Single Filename Argument
./fgrep test test1.txt; echo $?
This is a test file.
We really like to test our tests.
It's important to have a line ending to test
0
Matching lines printed; exit code 0.
Failed Run with a Single Filename Argument
./fgrep test test2.txt; echo $?
1
No match; exit code 1.
Multiple File Arguments
./fgrep test test1.txt test2.txt; echo $?
test1.txt:This is a test file.
test1.txt:We really like to test our tests.
test1.txt:It's important to have a line ending to test
0
Successful matches in one file; note prepended filename. Exit code 0.
Input Taken from stdin; -i Flag
./fgrep -i test < test1.txt; echo $?
This is a test file.
We really like to test our tests.
It's important to have a line ending to test
Test the line beginning.
0
-i flag ignores case of text in files; successful matches printed; exit code 0.
Using the -n Flag
./fgrep -n -i our test1.txt test2.txt; echo $?
test1.txt:2:We really like to test our tests.
test2.txt:2:Our spam is very tasty.
0
Case ignored (-i); line numbering added with the -n flag; successful matches in two files; line number (and colon) follows filename; exit code 0.
The -q Switch; Successful Match
./fgrep -q -n -i our test1.txt test2.txt; echo $?
0
Case ignored (-i); -n flag is ignored; at least one match found in at least one file (we know that there is one match for our in each file from other runs). No output; exit code 0.
-q Switch; Different Search String
./fgrep -q -n -i spam test1.txt test2.txt; echo $?
0
As previous example, but with different search string (spam). At
least one match found; no output; exit code 0.
-q Switch; Unmatched Search String
./fgrep -q -n -i chocolate test1.txt test2.txt; echo $?
1
New search string chocolate, which isn't in either test file.
Match fails; no output; exit code 1.
-q Flag; Missing File
./fgrep -q -i spam test1.txt test2.txt missing.txt; echo $?
0
Searching in three files, one of which does not exist (missing.txt). At least one match found in at least one file. No output; missing file ignored; exit code 0.
Missing File without -q Flag
./fgrep like test1.txt missing.txt test2.txt; echo $?
test1.txt:We really like to test our tests.
Couldn't open 'missing.txt': No such file or directory
test2.txt:I like spam. But spam doesn't like me.
1
Without -q flag, we see matches are found and printed; this time,
missing file is noticed and error message printed where successful
results would have appeared.
Despite successful matches, error message results in exit code 1.
The -n Flag
./fgrep -n test test1.txt; echo $?
1:This is a test file.
2:We really like to test our tests.
3:It's important to have a line ending to test
0
The -n flag gives us line numbers for matches, but doesn't include the filename when searching a single file. Exit code 0.
The -l Flag
./fgrep -l like test1.txt test2.txt; echo $?
test1.txt
test2.txt
0
The -l flag tells fgrep to only print the filename(s) with successful matches.
Successful matches; exit code 0.
The -l Flag and stdin
./fgrep -l test < test1.txt; echo $?
(standard input)
0
Input coming from stdin via shell expansion, so fgrep doesn't
have any filenames to report; instead it uses (standard input) as
the “filename” when a match is found.
Successful match; exit code 0.
Improper Usage
./fgrep -l -v test < test1.txt; echo $?
Usage: fgrep [-i] [-l] [-n] [-q] pattern [files]
2
Using an unsupported flag (-v) results in fgrep printing the
usage message and exiting with code 2. Note that no attempt at
matching was performed; the bad flag was caught during option
processing.
Argument Processing
In the C language, the main program is given two arguments: argc
and argv.
envp, but we can get away with ignoring it in almost all
programs.argc is an integer equal to the number of arguments that were
given on the command line; note that the command name itself is
always the first argument so argc is always nonzero.
The arguments themselves are C-style strings stored in the array
argv. Thus argv[0] is the name of the program (which you can
verify in gdb by typing p argv[0] if you're stopped at a
breakpoint in main).
Note that assuming an argument beginning with a dash necessarily implies that the pattern you're searching for cannot begin with a dash.
The standard version of fgrep has a solution to that problem, but
we'll solve it by ignoring it.
Similarly, argv[1] is the first argument expressed as a string,
and, since a C string is an array, argv[1][0] is the first character
of the first argument. That can be very useful because if that
character is a dash, the argument must be a switch.
As mentioned in the Overview of a Unix Filter section in the lab intro, switches can appear in any order. Thus it is incorrect to write code that assumes that switches will appear in any particular order, such as
if (strcmp(argv[1], "-q") == 0) /* Process -q switch */
if (strcmp(argv[2], "-l") == 0) /* Process -l switch */
That way lies madness. Instead, you should loop over the arguments,
using an if/else if construct, a switch statement, or a
combination of the two to figure out whether the argument is a
switch or the pattern.
For each switch, you should have a corresponding integer variable that is zero if the switch is not present, or nonzero if it appears. Your switch-processing code can then simply set the appropriate variable to 1.
Because nearly every program needs to process arguments and
switches, there have been many attempts to write library functions
that can simplify the task. For C, two of the most popular are
getopt and getopt_long, both of which have extensive manual
pages.
You are welcome to investigate either or both of these options, or to simply write your own argument-processing loop.
No matter how you choose to approach argument processing, when you reach a non-switch argument you should assume that it is the pattern and break out of the loop. Any arguments after the pattern will be (expected to be) the names of files to process.
Handling Files
If there are no file arguments provided on the command line, fgrep
should read from standard input (stdin), which is already open by
default and is of type FILE *. But if there are file names on
the command line, you will need to open each file in turn using
fopen (which returns a FILE *) and search through that file.
Thus, it's best to have a helper function (in our sample solution,
we brilliantly called ours fgrep) that accepts a FILE * and some
other arguments, and does the real work.
The best way to deal with the input file is to read it one line at a time, check for the presence of the search string, and then continue to the next line.
You can use fgets to do just that: it allows you to read from an
arbitrary FILE * and protects you from buffer-overflow attacks.
Keep in mind, however, that you also need to handle arbitrarily long input lines, so you can't just declare a 1000-byte buffer and assume that all lines will be shorter than that. Instead, you should
- Allocate a small buffer and use
fgetsto read into it. - If the result doesn't end in a newline, your buffer was too small.
- Expand your buffer by doubling its size.
- Read again, appending the new data to the existing buffer.
- Continue until the buffer is big enough to hold the whole line.
- Do your search on the contents of the buffer.
(You'll probably want to create some helper functions.)
A few notes on this approach:
- In C, you allocate memory using
malloc, which accepts a single integer argument that is the amount of memory, in bytes, that you want.mallocreturns a pointer to the desired memory, orNULLif you are out of memory. - You can expand previously allocated memory with
realloc. The expanded memory might be in a different place (so you will need to update the pointer to it), but it is guaranteed to contain the data that was in the original. See themanpage forreallocfor syntax. - Once you have expanded the buffer, you should keep it at the expanded size. There is no point in freeing it and starting over.
- To make sure your expansion code works, we recommend that you start with a ridiculously tiny buffer (our sample solution used 2 bytes; a single byte is too small). Starting with a small buffer will exercise your expansion code and quickly reveal any bugs.
- Be careful about memory leaks! You can free memory with
free.
(When logged in, completion status appears here.)