The program for this assignment, and everything else except the README file, is due at 9 PM on Wednesday, February 16th, 2000. The README file is due at 12 midnight on the same day (i.e., the moment Thursday starts). Refer to the homework policies page for general homework guidelines.
This assignment has several purposes:
Your assignment is to write a program that will read arbitrary-length whitespace-separated information from a file and list it to the standard output. While this may seem like a trivial exercise, you will find that there are a number of difficulties that make it worthy of your talents.
Your submission should consist of three files:
Makefile
make
utility. For this
assignment, a sample
Makefile
is provided for your use. So
long as you name your program assign_03.cc
, as
specified, the Makefile will work for your program.
DO NOT USE YOUR BROWSER'S CUT-AND-PASTE
FACILITY to download the Makefile. Instead, use
a right-click to pop up a menu, and select the
"Save link as..." (in Netscape) or "Save target as..." (in
Internet Explorer) option. If you use cut-and-paste,
your Makefile will not work. If you prefer, the
Makefile, together with the sample input and output files,
can be downloaded as either a ZIP
archive or a gzipped tar archive.
If you modify the provided Makefile, please do not change
the name of the executable generated. It will make the
graders' lives much easier if the executable is named
assign_03
. The provided Makefile does this
automatically for you.
assign_03.cc
README
You should create all of these files in a single directory that is
dedicated to this assignment. You can create a directory using the
mkdir
command. I suggest that you have a "cs70"
directory that will hold all assignments, and then a subdirectory for
each individual assignment. For example, you might type the following
commands:
mkdir cs70 cd cs70 mkdir hw03 cd hw03 emacs assign_03.cc
When you are ready to submit your program, cd
into the
proper directory (e.g., cs70/hw03
) and run the
cs70submitall
command. This command will prompt you for
the assignment number (see the top of this Web page)
and will then capture all of the source files,
Makefiles, and README files in your directory, so BE
CERTAIN that you don't have anything in your directory
besides your assignment.
If you discover a mistake in your program, you can resubmit it using the same command. You can submit as many times as you like; only the last version will be used.
Since the README file is due later than the rest of the assignment,
you may choose to submit it separately. You can do this with the
cs70submit
command:
cs70submit READMEIf you already submitted your code separately, DO NOT use
cs70submitall
to submit the README file, or it will
appear that you missed the deadline even though you were really on
time.
Your program must be able to read arbitrary-length strings from a C++ stream. To do this, you must use the self-expanding array techniques (sometimes called "infinite arrays" or "dynamic arrays") described in Weiss, Section 1.4. You will not know ahead of time how long the maximum string length might be, so you must make an initial guess and then adjust your guess if it turns out to be wrong.
The program will read from standard input (called cin
in
C++) until end-of-file is reached. Each
non-whitespace string that it
sees must be read separately, and then be printed on a separate line,
surrounded by double quotes.
See the sample output for an example of
the output format.
Your program must match this output format exactly when it
writes to cout
. If you want to generate other
output, such as debugging information, you must write it to
cerr
or you will lose points. The graders will be using
automated tools to test your program, and the tools are not forgiving
of variations in the output format.
readString
function
You will find it convenient, both for current development and for
future assignments, to put the string-reading code into a separate
function. I called mine readString
.
Each time the string-reading function is called, it must read a single
nonblank word from the
standard input stream, cin
, after first skipping any
whitespace that precedes that nonblank word. The function must not
skip over any whitespace that follows the word (note that the latter
requirement means that you will usually have to put
back a single whitespace character after reading each nonblank
string.
The function
returns a pointer to the string it created; it is the caller's
responsibility to get rid of the storage allocated by
readString
. At the beginning of your program, you will
have to declare a prototype for the function, which will look
something like this:
char* readString();
You will typically use the function in the following fashion:
char* foo; // ... foo = readString(); if (foo == NULL) { // No more input, handle EOF } // Process the string returned by readString // ... delete[] foo;
To be able to use cin
, you will have to include the
header file iostream.h
. You can do this by
adding the following statement near the top of your program:
#include <iostream.h>Be sure to place the pound sign at the beginning of the line, or the above statement won't work. (You will find that you need to include
iostream.h
in almost every program you ever write.)
If there is no more input (i.e., EOF on the stream you are reading), readString should return a NULL pointer so that its caller can know that there is no more input. You can do so by simply writing:
return NULL;
Since you don't know a priori how much space you will need for
your array, you must allocate it dynamically. There is a simple
implementation of a self-expanding array in Section 1.4 of Weiss,
although you can remove the try/catch
construct because
it's not necessary (it's OK for your program to abort if memory is
exhausted).
In general, you can set up a dynamically-sized array of characters using code similar to the following:
char* handy_array; // ... handy_array = new char[50];After creating the array, you can access elements using the [] notation:
handy_array[4] = 'a'; if (handy_array[i] == 'b') // ...
You will also find it necessary to pass the array between functions. For example, a function that both accepts and returns an array would be prototyped as:
char* handy_function(char* input_array);and might be written:
char* handy_function(char* input_array); { if (input_array[0] == '\0') { char* new_array; new_array = new char[100]; return new_array; } else return input_array; }which is kind of a silly function, but illustrates some of the basic ideas. You can use the general structure of this function as a guideline for creating a function that expands an array so that it is guaranteed to have enough room to hold n characters.
When you are done with an array, you must get rid of it using the
delete[]
operator. For example:
char* temp_array; // ... temp_array = function_that_creates_an_array(); // ... delete[] temp_array;Here, we assume that
function_that_creates_an_array
will
create the array using new
, but that the caller of that
function will take responsibility for deleting it later.
Input and output in C++ are rather different from most other
languages. For the moment, all of your input will be read from the
standard input device, named cin
, and output will be
written to the standard output, cout
. I/O is done using
the >>
and <<
operators. For
example, to print the famous "Hello, world" string, you would write:
cout << "Hello, world" << endl;This says "Write 'Hello, world' to
cout
, and then end the
line (endl
) by writing a newline character." Using
endl
also ensures that the output will be visible to you
immediately, rather than waiting until some future time that is more
convenient for the computer than it is for you.
You can string as many things together as you wish using
<<
, and the things can be of almost any type. No
blanks are produced between the things you write. So, for example,
you could write the alphabet in either of two ways:
cout << "abcdefghijklmnopqrstuvwxyz\n"; cout << 'a' << 'b' << 'c' << 'd' << 'e' << 'f' << 'g' << 'h' << 'i' << 'j' << 'k' << 'l' << 'm' << 'n' << 'o' << 'p' << 'q' << 'r' << 's' << 't' << 'u' << 'v' << 'w' << 'x' << 'y' << 'z' << endl;Obviously, the second method is more work, but it illustrates how to write single characters, which you will find useful. Incidentally, the first statement above uses
\n
instead of
endl
. Both will terminate the line, but \n
doesn't guarantee that you'll see the result immediately.
For this assignment, of course, the big question is "how do I output
the string that is returned by readString
?" Assuming
that readString
returns something of type char *
,
the answer is quite simple:
char* resultOfRead; // ... resultOfRead = readString(); // ... cout << resultOfRead << endl;The above example doesn't include the double quotes required by the assignment. We have to leave you something to figure out on your own!
Naturally, your program must be able to read characters from a stream, and to detect EOF on that stream. You must also be able to tell whitespace characters from nonwhite ones. Fortunately, there are library functions to do all of this for you. It also turns out that you will want one other feature, which is discussed below.
To read a character from an arbitrary stream named
stream
, you have a choice of two
functions. The first is named get
, and it is used like
this:
char ch; stream.get(ch);The alternative is a function named
read
, used like this:
char ch; stream.read(&ch, 1);The major difference between the two is that
read
can
read more than one character, so you must specify the number of
characters that you want. The choice between the two functions is a
matter of style and personal preference. Both return the stream they
were called on, which is useful for EOF testing.
As you read each character, you will need to collect it into a C-style
string. A C-style string is simply an array of characters, with one
sneaky part: following the last character of the string is a so-called
NUL character (one "L"), also known as the "null byte," which is written as
'\0'
. For
example, we could laboriously create the string "Hello, world" as
follows:
char* hello; hello = new char[13]; hello[0] = 'H'; hello[1] = 'e'; hello[2] = 'l'; hello[3] = 'l'; hello[4] = 'o'; hello[5] = ','; hello[6] = ' '; hello[7] = 'w'; hello[8] = 'o'; hello[9] = 'r'; hello[10] = 'l'; hello[11] = 'd'; hello[12] = '\0';Astute readers will note that, although "Hello, world" is only 12 characters, we had to allocate 13 slots in the array. That's because we need to leave an extra slot for the NUL character. Watch out! Forgetting to leave that extra slot is a common mistake.
So what's stream
? For this assignment, it's the built-in
stream named cin
. In a more general case, it can be any
object that has type istream&
. cin
is
one such object. If you wish, you could write your
readString
function like this:
char* readString() { // ... char ch; // ... if (cin.get(ch)) ... }in which case the function would only read from the standard input device. In a more general case, you could write:
char* readString(istream& stream) { // ... char ch; // ... if (stream.get(ch)) ... }and then call it like:
char* nextWord = readString(cin);The choice is up to you (although the second option will slightly simplify your life in future assignments).
Regardless of whether you use read
or get
,
you can test the stream for EOF by simply mentioning it in an
if
statement. If the stream tests as false
,
then the last previous get
or read
failed
and there is no more data. This test can be combined directly into
the I/O operation, as follows:
char ch; if (stream.get(ch)) { // ch is valid here, and has the next character } else { // EOF has been reached, and ch is *NOT* valid here }
Until now, we have been using the term "whitespace" fairly loosely. It turns out that the C++ language provides a strict definition of the term. Whitespace includes blanks, newlines, carriage returns, TAB characters, and several other more obscure ASCII characters that are intended to move the cursor (or print head) without leaving marks on the screen (or paper). Your program must conform to the strict definition.
Fortunately, the header file <ctype.h>
defines all sorts of
useful functions for testing character types. You can learn about
them by typing man ctype
on Turing (or most other Unix
boxes). One of the functions is isspace
, which returns
true if a character is whitespace according to the strict definition.
For example:
#include <ctype.h> // ... void myfunc() { char ch; // ... if (isspace(ch)) { // ch is whitespace: blank, tab, newline, or several others } else { // ch is non-whitespace: control or printing character } }
There is more information on isspace, as well as on the general subject of C/C++ string processing, available in the class C++ notes.
Of course, you can't test to see whether the next input character is
whitespace until you have read it from the stream. That's unfortunate,
because if I ask you to read a string terminated by whitespace, it
would be
incorrect for you to swallow any of the whitespace following
the string. Fortunately, there's a simple solution: the
istream
class (of which cin
is a member)
allows you to put back the last character
you saw, so that the next get
or read
will
re-read that character. (This is only guaranteed to work once before
you read again; you
can't necessarily put back an infinite number of characters.) The
function is named putback
:
istream& putback(char ch);So if I wanted to make sure that the next
get
on
cin
returned an "A", I could do:
cin.putback('A');
In the current assignment, your program will still work right even if you don't put whitespace characters back after you detect them. However, future assignments in CS70, which will make use of the code you are writing now, will depend on getting it right in this program.
There is also a subtle issue relating to the interaction of EOF with putting back characters, but you won't need to worry about it in this assignment.
If you are new to Unix, you may be wondering how you can test your program on an input file. You may also be wondering how you can make it stop if you test it from the terminal.
To run your program, simply type "./assign_03
" (without
the quotes). You can now type test data to your program and see how
it runs. If it misbehaves badly, you can abort it by typing
control-C. If it works right, you can stop it properly by sending an
EOF from the keyboard; this is done by typing control-D at the
beginning of a line.
To test your program with input from a file, type "./assign_03
<
file, where file is the name of an input
file. For example, you might want to try it on the sample input that was used to generate the sample output. (As when downloading the
Makefile, you should download the sample input by right-clicking and selecting
"Save link as..." or "Save target as..." from the pop-up menu.)
If you prefer, the sample input and output files, together with the
Makefile, can be downloaded as either a ZIP
archive or a gzipped tar archive.
You can even use a wonderful mechanism called a pipe to compare your output with the sample output. Download the sample output into a file (use a right-click; it's a good habit to get into). Then type:
./assign_03 < h3-input.txt | diff h3-output.txt -(assuming that you download the samples into files named
h3-input.txt
and h3-output.txt
). If your
program matches the sample exactly, you will see nothing on your
screen. Otherwise, you'll see a list of mismatches, something like
the following:
2c2 < "is" --- > "isis"which means that line 2 of the sample output was "is", but you generated "isis" instead.
The simplest way to debug, of course, is to insert extra output
statements into your program. Get into the habit of using
cerr
for debugging output. For example:
cerr << "Entered readString" << endl;
A far better way to debug is to use a debugger, such as
gdb
or ddd
. A debugger is equivalent to
being able to insert
output statements on the fly, without having to recompile and rerun
your program every time. There is
helpful
information on gdb available online.
Ddd
is a graphical interface to gdb
, and it
is pretty easy to learn.
If you are a big emacs
fan, you may wish to run
gdb
under emacs; a very
brief introduction is available in the class notes pages.
IT IS WELL WORTH YOUR TIME TO LEARN TO USE GDB NOW.
The people who will succeed in this course are the ones who take the
time to learn about tools such as make
and
gdb
early on, when it's relatively easy and there is less
time pressure. If you wait until late in the course, you will be so
worried about deadlines and so intimidated by gdb
that
you'll be afraid to stop and learn it, and as a result you'll take
hours to find bugs that gdb
could spot in minutes.
There are several parts of this assignment that can surprise you. Here are a few:
delete
each piece of memory as soon as you are done with it. The
graders will be checking for this bug.
echo This is a partial line | tr -d \\012 > partial_lineWhen you run your program with input from such a file, it should still list each of the five words correctly.
There is more information on using C++ on Turing available in the departmental quick-reference guide and the C++ quick reference guide.
This page maintained by Geoff Kuenning.