CS70, Spring 2000

Homework Assignment #3

The program for this assignment, and everything else except the README file, is due at 9 PM on Wednesday, February 16th, 2000. The README file is due at 12 midnight on the same day (i.e., the moment Thursday starts). Refer to the homework policies page for general homework guidelines.

This assignment has several purposes:

Begin becoming familiar with the C++ language,
Get some practice with pointers and memory allocation,
Get used to C++ character processing and some of the idiosyncrasies of C++ I/O, and
Develop code that will be useful to you in future assignments.

Take special note of the last point. Many of the assignments in CS70 will build on work done earlier in the term. If you are careful to develop good software now, you will find it much easier to get through the course later.

Overview

Your assignment is to write a program that will read arbitrary-length whitespace-separated information from a file and list it to the standard output. While this may seem like a trivial exercise, you will find that there are a number of difficulties that make it worthy of your talents.

Your submission should consist of three files:

Makefile: A "make file" containing instructions on how to compile your program with the make utility. For this assignment, a sample Makefile is provided for your use. So long as you name your program assign_03.cc, as specified, the Makefile will work for your program. DO NOT USE YOUR BROWSER'S CUT-AND-PASTE FACILITY to download the Makefile. Instead, use a right-click to pop up a menu, and select the "Save link as..." (in Netscape) or "Save target as..." (in Internet Explorer) option. If you use cut-and-paste, your Makefile will not work. If you prefer, the Makefile, together with the sample input and output files, can be downloaded as either a ZIP archive or a gzipped tar archive.
If you modify the provided Makefile, please do not change the name of the executable generated. It will make the graders' lives much easier if the executable is named assign_03. The provided Makefile does this automatically for you.
assign_03.cc: The C++ code for the assignment. For this assignment, you should be able to implement your program using only a single file.
README: A documentation file, as specified in the homework policies page. Note that this file is not due until 3 hours after the other files in the assignment.

You should create all of these files in a single directory that is dedicated to this assignment. You can create a directory using the mkdir command. I suggest that you have a "cs70" directory that will hold all assignments, and then a subdirectory for each individual assignment. For example, you might type the following commands:

    mkdir cs70
    cd cs70
    mkdir hw03
    cd hw03
    emacs assign_03.cc

When you are ready to submit your program, cd into the proper directory (e.g., cs70/hw03) and run the cs70submitall command. This command will prompt you for the assignment number (see the top of this Web page) and will then capture all of the source files, Makefiles, and README files in your directory, so BE CERTAIN that you don't have anything in your directory besides your assignment.

If you discover a mistake in your program, you can resubmit it using the same command. You can submit as many times as you like; only the last version will be used.

Since the README file is due later than the rest of the assignment, you may choose to submit it separately. You can do this with the cs70submit command:

    cs70submit README

If you already submitted your code separately, DO NOT use cs70submitall to submit the README file, or it will appear that you missed the deadline even though you were really on time.

Details

Your program must be able to read arbitrary-length strings from a C++ stream. To do this, you must use the self-expanding array techniques (sometimes called "infinite arrays" or "dynamic arrays") described in Weiss, Section 1.4. You will not know ahead of time how long the maximum string length might be, so you must make an initial guess and then adjust your guess if it turns out to be wrong.

The program will read from standard input (called cin in C++) until end-of-file is reached. Each non-whitespace string that it sees must be read separately, and then be printed on a separate line, surrounded by double quotes. See the sample output for an example of the output format.

Your program must match this output format exactly when it writes to cout. If you want to generate other output, such as debugging information, you must write it to cerr or you will lose points. The graders will be using automated tools to test your program, and the tools are not forgiving of variations in the output format.

The `readString` function

You will find it convenient, both for current development and for future assignments, to put the string-reading code into a separate function. I called mine readString. Each time the string-reading function is called, it must read a single nonblank word from the standard input stream, cin, after first skipping any whitespace that precedes that nonblank word. The function must not skip over any whitespace that follows the word (note that the latter requirement means that you will usually have to put back a single whitespace character after reading each nonblank string.

The function returns a pointer to the string it created; it is the caller's responsibility to get rid of the storage allocated by readString. At the beginning of your program, you will have to declare a prototype for the function, which will look something like this:

    char* readString();

You will typically use the function in the following fashion:

    char* foo;
    // ...
    foo = readString();
    if (foo == NULL) {
        // No more input, handle EOF
    }
    // Process the string returned by readString
    // ...
    delete[] foo;

To be able to use cin, you will have to include the header file iostream.h. You can do this by adding the following statement near the top of your program:

    #include <iostream.h>

Be sure to place the pound sign at the beginning of the line, or the above statement won't work. (You will find that you need to include iostream.h in almost every program you ever write.)

If there is no more input (i.e., EOF on the stream you are reading), readString should return a NULL pointer so that its caller can know that there is no more input. You can do so by simply writing:

    return NULL;

Memory Management

Since you don't know a priori how much space you will need for your array, you must allocate it dynamically. There is a simple implementation of a self-expanding array in Section 1.4 of Weiss, although you can remove the try/catch construct because it's not necessary (it's OK for your program to abort if memory is exhausted).

In general, you can set up a dynamically-sized array of characters using code similar to the following:

    char* handy_array;
    // ...
    handy_array = new char[50];

After creating the array, you can access elements using the [] notation:

    handy_array[4] = 'a';
    if (handy_array[i] == 'b')
        // ...

You will also find it necessary to pass the array between functions. For example, a function that both accepts and returns an array would be prototyped as:

    char* handy_function(char* input_array);

and might be written:

    char* handy_function(char* input_array);
    {
        if (input_array[0] == '\0') {
            char* new_array;
            new_array = new char[100];
            return new_array;
        }
        else
            return input_array;
    }

which is kind of a silly function, but illustrates some of the basic ideas. You can use the general structure of this function as a guideline for creating a function that expands an array so that it is guaranteed to have enough room to hold n characters.

When you are done with an array, you must get rid of it using the delete[] operator. For example:

    char* temp_array;
    // ...
    temp_array = function_that_creates_an_array();
    // ...
    delete[] temp_array;

Here, we assume that function_that_creates_an_array will create the array using new, but that the caller of that function will take responsibility for deleting it later.

Producing output

Input and output in C++ are rather different from most other languages. For the moment, all of your input will be read from the standard input device, named cin, and output will be written to the standard output, cout. I/O is done using the >> and << operators. For example, to print the famous "Hello, world" string, you would write:

    cout << "Hello, world" << endl;

This says "Write 'Hello, world' to cout, and then end the line (endl) by writing a newline character." Using endl also ensures that the output will be visible to you immediately, rather than waiting until some future time that is more convenient for the computer than it is for you.

You can string as many things together as you wish using <<, and the things can be of almost any type. No blanks are produced between the things you write. So, for example, you could write the alphabet in either of two ways:

    cout << "abcdefghijklmnopqrstuvwxyz\n";
    cout << 'a' << 'b' << 'c' << 'd' << 'e' << 'f' << 'g' << 'h'
      << 'i' << 'j' << 'k' << 'l' << 'm' << 'n' << 'o' << 'p' << 'q'
      << 'r' << 's' << 't' << 'u' << 'v' << 'w' << 'x' << 'y' << 'z'
      << endl;

Obviously, the second method is more work, but it illustrates how to write single characters, which you will find useful. Incidentally, the first statement above uses \n instead of endl. Both will terminate the line, but \n doesn't guarantee that you'll see the result immediately.

For this assignment, of course, the big question is "how do I output the string that is returned by readString?" Assuming that readString returns something of type char *, the answer is quite simple:

    char* resultOfRead;
    // ...
    resultOfRead = readString();
    // ...
    cout << resultOfRead << endl;

The above example doesn't include the double quotes required by the assignment. We have to leave you something to figure out on your own!

Functions for processing the input

Naturally, your program must be able to read characters from a stream, and to detect EOF on that stream. You must also be able to tell whitespace characters from nonwhite ones. Fortunately, there are library functions to do all of this for you. It also turns out that you will want one other feature, which is discussed below.

Reading characters

To read a character from an arbitrary stream named stream, you have a choice of two functions. The first is named get, and it is used like this:

    char ch;
    stream.get(ch);

The alternative is a function named read, used like this:

    char ch;
    stream.read(&ch, 1);

The major difference between the two is that read can read more than one character, so you must specify the number of characters that you want. The choice between the two functions is a matter of style and personal preference. Both return the stream they were called on, which is useful for EOF testing.

As you read each character, you will need to collect it into a C-style string. A C-style string is simply an array of characters, with one sneaky part: following the last character of the string is a so-called NUL character (one "L"), also known as the "null byte," which is written as '\0'. For example, we could laboriously create the string "Hello, world" as follows:

    char* hello;
    hello = new char[13];
    hello[0] = 'H';
    hello[1] = 'e';
    hello[2] = 'l';
    hello[3] = 'l';
    hello[4] = 'o';
    hello[5] = ',';
    hello[6] = ' ';
    hello[7] = 'w';
    hello[8] = 'o';
    hello[9] = 'r';
    hello[10] = 'l';
    hello[11] = 'd';
    hello[12] = '\0';

Astute readers will note that, although "Hello, world" is only 12 characters, we had to allocate 13 slots in the array. That's because we need to leave an extra slot for the NUL character. Watch out! Forgetting to leave that extra slot is a common mistake.

So what's stream? For this assignment, it's the built-in stream named cin. In a more general case, it can be any object that has type istream&. cin is one such object. If you wish, you could write your readString function like this:

    char* readString()
    {
        // ...
        char ch;
        // ...
        if (cin.get(ch)) ...
    }

in which case the function would only read from the standard input device. In a more general case, you could write:

    char* readString(istream& stream)
    {
        // ...
        char ch;
        // ...
        if (stream.get(ch)) ...
    }

and then call it like:

    char* nextWord = readString(cin);

The choice is up to you (although the second option will slightly simplify your life in future assignments).

Testing for EOF

Regardless of whether you use read or get, you can test the stream for EOF by simply mentioning it in an if statement. If the stream tests as false, then the last previous get or read failed and there is no more data. This test can be combined directly into the I/O operation, as follows:

    char ch;
    if (stream.get(ch))
        {
        // ch is valid here, and has the next character
	}
    else
        {
        // EOF has been reached, and ch is *NOT* valid here
	}

Testing for whitespace

Until now, we have been using the term "whitespace" fairly loosely. It turns out that the C++ language provides a strict definition of the term. Whitespace includes blanks, newlines, carriage returns, TAB characters, and several other more obscure ASCII characters that are intended to move the cursor (or print head) without leaving marks on the screen (or paper). Your program must conform to the strict definition.

Fortunately, the header file <ctype.h> defines all sorts of useful functions for testing character types. You can learn about them by typing man ctype on Turing (or most other Unix boxes). One of the functions is isspace, which returns true if a character is whitespace according to the strict definition. For example:

    #include <ctype.h>
    // ...
    void myfunc()
    {
        char ch;
        // ...
        if (isspace(ch)) {
            // ch is whitespace: blank, tab, newline, or several others
        }
        else {
            // ch is non-whitespace: control or printing character
        }
    }

There is more information on isspace, as well as on the general subject of C/C++ string processing, available in the class C++ notes.

Putting things back

Of course, you can't test to see whether the next input character is whitespace until you have read it from the stream. That's unfortunate, because if I ask you to read a string terminated by whitespace, it would be incorrect for you to swallow any of the whitespace following the string. Fortunately, there's a simple solution: the istream class (of which cin is a member) allows you to put back the last character you saw, so that the next get or read will re-read that character. (This is only guaranteed to work once before you read again; you can't necessarily put back an infinite number of characters.) The function is named putback:

    istream& putback(char ch);

So if I wanted to make sure that the next get on cinreturned an "A", I could do:

    cin.putback('A');

In the current assignment, your program will still work right even if you don't put whitespace characters back after you detect them. However, future assignments in CS70, which will make use of the code you are writing now, will depend on getting it right in this program.

There is also a subtle issue relating to the interaction of EOF with putting back characters, but you won't need to worry about it in this assignment.

Running the program

If you are new to Unix, you may be wondering how you can test your program on an input file. You may also be wondering how you can make it stop if you test it from the terminal.

To run your program, simply type "./assign_03" (without the quotes). You can now type test data to your program and see how it runs. If it misbehaves badly, you can abort it by typing control-C. If it works right, you can stop it properly by sending an EOF from the keyboard; this is done by typing control-D at the beginning of a line.

To test your program with input from a file, type "./assign_03 < file, where file is the name of an input file. For example, you might want to try it on the sample input that was used to generate the sample output. (As when downloading the Makefile, you should download the sample input by right-clicking and selecting "Save link as..." or "Save target as..." from the pop-up menu.) If you prefer, the sample input and output files, together with the Makefile, can be downloaded as either a ZIP archive or a gzipped tar archive.

You can even use a wonderful mechanism called a pipe to compare your output with the sample output. Download the sample output into a file (use a right-click; it's a good habit to get into). Then type:

    ./assign_03 < h3-input.txt | diff h3-output.txt -

(assuming that you download the samples into files named h3-input.txt and h3-output.txt). If your program matches the sample exactly, you will see nothing on your screen. Otherwise, you'll see a list of mismatches, something like the following:

    2c2
    < "is"
    ---
    > "isis"

which means that line 2 of the sample output was "is", but you generated "isis" instead.

Debugging

The simplest way to debug, of course, is to insert extra output statements into your program. Get into the habit of using cerr for debugging output. For example:

    cerr << "Entered readString" << endl;

A far better way to debug is to use a debugger, such as gdb or ddd. A debugger is equivalent to being able to insert output statements on the fly, without having to recompile and rerun your program every time. There is helpful information on gdb available online. Ddd is a graphical interface to gdb, and it is pretty easy to learn. If you are a big emacs fan, you may wish to run gdb under emacs; a very brief introduction is available in the class notes pages.

IT IS WELL WORTH YOUR TIME TO LEARN TO USE GDB NOW. The people who will succeed in this course are the ones who take the time to learn about tools such as make and gdb early on, when it's relatively easy and there is less time pressure. If you wait until late in the course, you will be so worried about deadlines and so intimidated by gdb that you'll be afraid to stop and learn it, and as a result you'll take hours to find bugs that gdb could spot in minutes.

Tricky Stuff

There are several parts of this assignment that can surprise you. Here are a few:

C++ strings must be null-terminated. Your string-reading function must allow space for the null terminator, and must append it to every string.
It's very easy to have a memory leak. Make sure you delete each piece of memory as soon as you are done with it. The graders will be checking for this bug.
Be sure your string-reading function skips the whitespace before the string, and leaves the whitespace after it undisturbed. Failure to follow this rule will introduce subtle (or perhaps unsubtle!) bugs in later programs.
If you want maximum credit, your program should behave properly with a file that has no whitespace before EOF. Since a newline is whitespace, this means a file that has an incomplete last line. One way to create such a file is to type:
```
    echo This is a partial line | tr -d \\012 > partial_line
```
When you run your program with input from such a file, it should still list each of the five words correctly.

Other Resources

There is more information on using C++ on Turing available in the departmental quick-reference guide and the C++ quick reference guide.

This page maintained by Geoff Kuenning.