CS70, Spring 2004

Homework Assignment #2

The program for this assignment, and everything else except the README file, is due at 9 PM on Wednesday, February 4, 2004. The README file is due at midnight on the same day (i.e., the moment Thursday begins). Refer to the homework policies page for general homework guidelines.

This assignment has several purposes:

Overview

It's next term and you have been promoted from innocent CS 70 student to expert CS 70 grader. As usual, the graders are trying to evaluate student programs without adequate supporting software. They would like to have a program that automatically detects basic mechanical problems with submissions.

Prof. Kuenning is currently trying to get money from the Dean to have a student write the final version of this program, with elaborate features and a sexy web interface, over next summer. Meanwhile, however, they need something to help, even if it's basic. Since you were late to a grader meeting, you have been put in charge of writing a first-draft version. Although it must be written quickly, it must be kept modular so that it can be extended into the final elaborate system.

Your program will analyze the input code and tell the user about:

Your code will make use of a function named dumbReadLine, written by some student GHK who graduated long ago. This function might not be what you'd like, but Prof. Kuenning wants you to build only the analysis program now and deal with the input-reader later. You should refrain from pointing out that dumbReadLine is almost identical to the getline function in the C++ library, because the minor differences will actually make your life easier.

Submission Details

Getting Started

As with homework 1, you must use cs70checkout and cs70submit * to manage your code. Start your assignment by going to wherever you keep your CS70 assignments and typing:

cs70checkout hw02

What to Write

Your program must be in one file, named assign_02.cc. Your code must use the dumbReadLine function, supplied in the files dumbreadline.hh and dumbreadline.cc . It must compile with the supplied Makefile. These three files, along with some sample input and output files, will be provided to you when you create your assignment with cs70checkout.

Do not modify any of the three supplied files. Create only the file assign_02.cc and your README file. If you modify any of the supplied files, the graders will replace them with their own copy.

Submitting

Use cs70submit * to submit your code.

Code Details

Your code must process the input line-by-line using the function dumbReadLine. Each type of report should be created by a separate function. Each of these functions should examine only one line of input: none of them requires examining more than one line at a time or maintaining state from one line to the next.

The dumbreadline function supplies each line as an array of characters (see below). (This is the standard way of representing a string in C, and it is still quite common in C++ programs as well.) Thus, the first character on the line is lineBuffer[0], the second is lineBuffer[1], and so forth. To examine all characters, you will have to write a loop that searches through the buffer until it reaches the '\n' (newline) character at the end of the line.

To receive a top grade, your code must:

When counting the number of characters in a line, each TAB character ('\t' in C++) must be converted to the appropriate number of spaces. A TAB in the input moves the cursor to the next column which is a "tab stop." If the cursor is already on a tab stop, TAB moves it to the next such column.

The terminology dates back to old mechanical typewriters. Those allowed tab stops to be put in arbitrary columns. In normal computer programming, however, a column is a tab stop if the column is a multiple of the "tab width". (Columns start with 0, so the first column is the first tab stop.) Your program should set the tab width to the industry standard of 8 characters. It must be capable of handling tabs anywhere in a line, no matter how long that line is.

Here is an example of 8-character tab stops:

          1         2         3         4         5
012345678901234567890123456789012345678901234567890123456789
        X       X       X       X       X       X       X
The X's indicate where the cursor would next land if you typed a TAB at the beginning of the line or immediately after each X.

Hint: it is useful to apply the modulo (%) operator to the current column number when calculating tab stops.

The number of characters in a line does NOT count the final end-of-line character.

A sequence of capital letters can contain whitespace (but not line breaks). A sequence of whitespace characters counts as part of the sequence of capitals if there are capital alphabetic characters immediately before and after the whitespace. For the purposes of this test, you should count any whitespace, including TAB, as a single character. (It's hard enough to do without properly expanding TABs, and you've already proven that you can expand tab stops...)

Running Your Program

To test your program, you will probably want to run it on one or more test files. To run your program on a file named test.txt, use the following Turing command:

    ./assign_02 < test.txt

This command will provide test.txt to your program on the standard input device. If you want to collect the output into a file, for example testoutput, use:

    ./assign_02 < test.txt > testoutput

Finally, if you want to use diff (see below) to check the output against a sample output file named sampleoutput, use:

    ./assign_02 < test.txt | diff sampleoutput -
If your program runs exactly correctly, this command will produce no output. Otherwise, the output will list all of the differences between your program and the sample output.

Output Format

All output must be written to the standard output device, cout. When your code reports lines with the above properties, the reports must be in the following format:

  1. If a line is longer than 80 characters, produce a message like:
    Line n is too long: m characters.
    where n is the line number (counting from 1) and m is the line length, not including the newline.
  2. If a line contains a string of more than 10 capital letters and whitespaces in a row, produce a message like:
    Line n has a string of m capital letters and spaces.
    where n is the line number and m is the length of the longest string in the line that is made up solely of uppercase letters and embedded whitespace. (Note that "a ABC b" contains an uppercase string of length 3, because the blanks are not embedded in the uppercase string, but "a AB C b" has an uppercase string of length 4.) The message should report the length of the longest uppercase string in the line.
  3. If a line contains "goto", produce a message like:
    Line n seems to contain a goto statement.
    where n is the line number.
  4. If a line contains "//" without surrounding whitespace, produce a message like
    Line n contains a // comment without surrounding whitespace.
    where n is the line number.

If a line suffers from multiple problems, all problems must be reported, and they must be reported in the order given above. No problem should be reported more than once per line.

The assignment includes some input and output example files to test your program on:

Like the supplied source files, the input and output files for the two test cases are included when you first set up your assignment with cs70checkout.

WARNING:The graders are extremely picky about format. For full credit, your output must exactly match the model solution. Use the program diff (see homework #1) to check for subtle differences.

If diff reports differences, but you can't spot any trouble, try the following variation:

    % ./assign_02 | diff original.out - | cat -v -e -t
The "| cat -v -e -t" part causes normally invisible things to become visible. Specifically, it replaces all invisible control characters with a two-character sequence starting with a caret (^) [-v], marks the end of each line by appending a dollar sign ($) so that you can see if there are trailing spaces [-e], and represents TABs as ^I so you can tell them from strings of spaces [-t]. Incidentally, you can write this command more briefly as "cat -vet" and remember it as "take the cat to the vet to diagnose problems."

Required Style

The required style for this assignment differs slightly from what will be required in later assignments. Specifically:

Some Useful C++ Hints

I/O

Put the following lines at the top of your program. These include declarations allow you to use (in order), various useful character-analysis utilities, the input and output operations, and the dumbReadLine function.

#include <cctype>
#include <iostream>
#include "dumbreadline.hh"

To print an item foo to standard output, you use the statement cout << foo;. Here are some handy examples of using cout:

   // Write a literal piece of text
   cout << "Test string";
   // Write an integer
   cout << 3;
   // Write the value of a variable x
   cout << x;
   // Start a new output line
   cout << endl;     
   // Compact way to write several values
   cout << "The value of x is " << x << " right now" << endl;

There is much more information on I/O available in the class C++ notes.

The inputs to dumbReadLine are an input stream, an array of characters, and the length of the character array. For this assignment, the input stream will always be cin (standard input) and the array length will always be 1024. For example (bad style alert! -- 1024 is a "magic number"):

    char lineBuffer[1024];
    dumbReadLine(cin, lineBuffer, 1024);

The call to dumbReadLine will fill lineBuffer (which you have previously created) with characters from the input, stopping when it hits the end of a line or the end of the file. When looking through your array, you can tell when you've reached the end of this line because the last two characters will be an end of line ('\n') character followed by a null ('\0') character. You can stop when you hit the '\n'; there is no need to check for the '\0'.

You can pass the contents of lineBuffer to another function as follows:

    // ... use dumbReadLine as above ...
    handyHelperFunction(lineBuffer, ...);

In your helper function, declare things like this (bad style alert!):

void handyHelperFunction(char buffer[1024], ...)
{
    ...
    if (buffer[i] == ...)
        ...
}
Note that you pass the array by name only, without brackets, and that the function declares its formal parameter the same was as the main program did (including the size). There are other ways to accomplish the same result, but this one will work fine for now.

If an input line is longer than the buffer you have given to dumbReadLine, it will generate an error message and halt your program.

When you have read all the lines of the input file and there is nothing left, a call to dumbReadLine will return just as if it had succeeded in reading some additional line. Therefore, immediately after you call dumbReadLine, you must check the status of the input stream, to see whether the dumbReadLine call succeeded in reading a line of data or hit the end of the file. Do this as follows:

    if (!cin)
       // nothing more in the input cin; we have hit the end of the file
or 
    if (cin)
       // cin still has input; we have not yet hit the end of the file

Warning: you can't check the status of cin for EOF until after you have tried to read a line. It's sort of like a blind person at a curb: he doesn't know there's a drop-off until he has tried to put his foot down on the pavement. This is a general principle of C++ I/O that often bites people.

Checking for Whitespace

The C++ utility function isspace determines whether a character is whitespace (including tab, blank, newline, and a few other characters). The function isupper determines whether it is uppercase. The function isalnum checks for alphanumeric characters. Use these functions: don't improvise your own. They are used as follows:

    if (isspace(lineBuffer[i]))
        // The character in lineBuffer[i] is whitespace

For the section of your program that deals with "//" comments, it might help to know that a newline character (end of line) is considered to be whitespace.

There is more information on these operations, as well as on the general subject of C/C++ string processing, available in the class C++ notes. Note that for this assignment you are prohibited from using strcpy or strcasecmp.

Tricks for Dealing With Magic Numbers

It is possible (though not likely) that you will find that you need to use the constant 4 to represent the length of the string goto. It turns out that there's a clean way to avoid building that number into your program. You can use the following construct:

        sizeof "goto" - 1
to generate the proper value. You can even give it a name:
        const int GOTO_SIZE = sizeof "goto" - 1;
Even better, you can also make "goto" a constant:
        const char GOTO_STATEMENT[] = "goto";
        const int GOTO_SIZE = sizeof GOTO_STATEMENT - 1;
so that there is a single point of change in the extremely unlikely event that you want to change the spelling of goto.

What We Expect to See (Grading)

For a passing grade, we expect that:

In particular, it is unacceptable to submit undocumented or very poorly documented code, ignore the formatting guidelines, and/or write one gianormous main function that does most of the work. Merely producing the correct output is not sufficient.

Prof. Kuenning is a nut about spelling (see "ispell -v)". Check out the style guidelines on spelling and the instructions on how to use ispell.

For an "A", we expect that:

Ok, ok. No one's perfect. An "A" submission can have small deviations from the above. But only small ones.


© 2004, Geoff Kuenning

This page is maintained by Geoff Kuenning.