CS70, Spring 2000

Homework Assignment #8

The program for this assignment, and everything else except the README file, is due at 9 PM on Wednesday, April 12, 2000. The README file is due at 12 midnight on the same day. Refer to the homework policies page for general homework guidelines.

The primary purposes of this assignment are:

Overview

Your assignment is to implement a simple encryption program. To make the assignment both interesting and challenging, you will be required to use certain data structures and new techniques.

The cryptosystem that your program will implement is called a Vignère cipher. It is named after Blaise de Vignère, although it is really a corruption of a much more secure cipher he invented in 1585. The Vignère cipher is based the earlier Caesar cipher, which rotates letters through the alphabet. For example, a Caesar rotation of 1 letter would replace "A" by "B", "B" by "C", and so forth, wrapping around to replace "Z" by "A". A rotation of 2 would replace "A" by "C", etc.

Caesar used a constant rotation for his encoding, which made decryption quite simple. The variation named after Vignère modified the scheme by varying the rotation for each successive letter of the message. For example, the first letter might be rotated by 14 positions, the second by 5, and so forth. The pattern of rotation is controlled by a key, which is expressed as a simple word or phrase. The key is repeated when necessary to make it match up with the message.

An example may help explain this further. Suppose our key is "ALPHA" and we wish to encrypt the message "NOW IS THE TIME FOR CS FUN." Ignoring spaces, we can write the key and the message lined up as follows:

    ALPHAALPHAALPHAALPHA
    NOWISTHETIMEFORCSFUN
We will let "A" represent rotation by zero positions, "B" mean 1, and so forth. Then the encryption of the above message can be written directly below it:
    ALPHAALPHAALPHAALPHA
    NOWISTHETIMEFORCSFUN
    --------------------
    NZLPSTSTAIMPUVRCDUBN

It turns out that cryptographers traditionally break messages into 5-character groups to make things a bit easier to work with, so the pencil-and-paper method of doing the above would generate:

    ALPHA ALPHA ALPHA ALPHA
    NOWIS THETI MEFOR CSFUN
    -----------------------
    NZLPS TSTAI MPUVR CDUBN

When the message is decrypted, it is up to the recipient to figure out where the blanks and punctuation should have been.

We will take advantage of the computer's flexibility by implementing a slight variation on the Vignère cipher. Instead of encrypting a 26-letter alphabet, we will add support for blanks and encrypt 27 symbols: A through Z, plus blanks. However, we will keep the 5-character grouping feature, so we can't produce blanks in the encrypted output. Instead, our output alphabet will use a period to represent the 27th character. The message above, when encrypted through the sample solution with the key "ALPHA", then becomes "NZKGI SKHOE .DXTE .QCY. CCOMU NK".

Incidentally, the Vignère cipher is not a secure method of encryption, except in one special case. Modern cryptographers can decode it with very little effort, so you should not try to use it to protect anything important. (The special case is that if the key is at least as long as the message, the key is a truly random string of characters rather than an English word or phrase, and the key is never used again, then the Vignère method reduces to something called a "one-time pad", which is the only provably secure encryption method.)

Your encryption program will prompt the user for a password, and then encode or decode a message contained in a file. There are some very specific requirements for how the program operates, designed so that your functions could conceivably be moved into a different program (i.e., a mail reader) someday if you wished.

High-Level Interface

Your program will follow a fairly standard Unix interface style. There will be two ways to invoke your program, depending on whether you are encrypting or decrypting information.

Encryption

To encrypt, one can use:

    ./assign_08 -e file
	or
    ./assign_08 -e -g 5 file
	or
    ./assign_08 -g 5 -e file
The -e indicates that you want to encrypt file and write the result to cout. The optional -g switch specifies a grouping factor (i.e., generate code groups of 5 characters at a time). Your program should not assume that switches appear in any particular order. Refer to homework #7 for more information on how to process command-line arguments.

It turns out (try running "ps auxw") that it is not a good idea to give passwords on the command line. Instead, your program should prompt for a password on cerr, using the string "Password: " (with a trailing blank, and no newline), and read a password from cin. The password should be a word or phrase, terminated by a newline character.

In a real encryption program, you would turn off echo on the terminal so that nobody could look over the user's shoulder and get the password, but that's a big pain for a CS70 assignment, so your program should ignore that detail. The actual password will be somewhat modified from what the user types by converting lowercase to uppercase and changing non-alphabetic characters to blanks (the same rules will be applied to the message to be encrypted).

If the -g switch is not given, the output of the program should be a single line. If -g appears, the output should be divided into groups of the specified number of characters, with each group separated by a blank. When groups are being generated, your program should never generate an output line longer than 70 characters (unless the argument to -g is itself greater than 70). There should be no blanks at the end of the output lines.

Since we are working with a 27-character alphabet ("A" through "Z" plus a blank), your program will not be able to deal nicely with lowercase characters and punctuation. Therefore, lowercase characters should be converted to uppercase, and all non-alphabetic characters should be treated as blanks. For example, the string "CS 70 is really, really FUN!" would be encrypted exactly the same as if it read as follows (the line of dots is there to help you see all the blanks):

    CS    IS REALLY  REALLY FUN 
    ...........................

Because blanks are used for grouping, they cannot be part of your output alphabet. Instead, the meaningful part of the output of an encryption run should be selected from "A" through "Z" and the period. The period should be considered to either precede "A" or follow "Z" (the choice is yours, and the results should be equivalent in either case).

Decryption

Decryption is simpler than encryption. There is only one way to decrypt:

    ./assign_08 -d file

The specified file should be some previous output of your program. As with encryption, you should prompt the user for the password. The decrypted message should be written to cout. Since all formatting and punctuation have been lost, you should write it as a single long line, and let the user worry about figuring it out.

Error Checking

Your program should verify that its arguments are correct. This includes ensuring that exactly one of -d and -e are specified, making sure that -g is not given with -d, making sure that -g has an argument, ensuring that a file to be encrypted or decrypted is given on the command line, and making sure that no illegal switches are given. If any or these rules are violated, you should print a usage message similar to the following:

    Usage: ./assign_08 {-e|-d [-g n]} file

Submitting

Your submission should consist of a number of files:

Makefile
A "make file" containing instructions on how to compile your program with the make utility. For this assignment, you must produce your own Makefile. I suggest that you start with one from a previous assignment. You will be graded on your Makefile, so be sure you modify it and test it.

The makefile you provide must produce an executable named assign_08.

assign_08.cc
The C++ code for your main program assignment.
*.hh, *.cc, *.icc
Header and source files containing the classes you implement. Some of these will be lifted directly from previous assignments, or will be extended versions of classes in previous assignments. It is up to you to choose the names for these files.
README
A documentation file, as specified in the homework policies page. Note that this file is not due until 3 hours after the other files in the assignment.

If you wish, you can create other files to help you develop this assignment, but it is not necessary.

When you are ready to submit your program, cd into the proper directory (e.g., cs70/hw08) and run the cs70submitall command. This command will prompt you for the assignment number (see the top of this Web page) and will then capture all of the source files, Makefiles, and README files in your directory, so BE CERTAIN that you don't have anything in your directory besides your assignment.

If you discover a mistake in your program, you can resubmit it using the same command. You can submit as many times as you like; only the last version will be used.

Since the README file is due later than the rest of the assignment, you may choose to submit it separately. You can do this with the cs70submit command:

    cs70submit README
If you already submitted your code separately, DO NOT use cs70submitall to submit the README file, or it will appear that you missed the deadline even though you were really on time.

Conversion to Standard Format

We have already discussed most of the high-level interface to your program. It should accept a message to be encrypted in mixed case, converting lowercase characters to uppercase and converting non-alphabetic characters to blanks. The <ctype.h> header file defines a couple of functions that will be useful in this regard:

isalpha(ch)
returns true if the character ch is alphabetic ("A" to "Z" or "a" to "z") and false otherwise.
toupper(ch)
returns the uppercase equivalent of ch if ch is alphabetic. The result is unreliable if ch is not alphabetic.

Doing Arithmetic Operations on Characters

This assignment would be much easier if characters were encoded in a friendly fashion. For example, if the letter 'A' were represented by the number 0, 'B' by 1, and so forth up to 'Z' = 25, with 26 representing a blank, it would be relatively easy to write the encryption code. Unfortunately, 'A' is decimal 65. However, there is an easy way to solve this problem: do arithmetic on characters. As an example, you can convert a character ch to the 'A' = 0 scheme with the following code:

    char convertedCharacter = ch - 'A';
This trick will work only if ch is one of the uppercase letters 'A' through 'Z'. It will not work if ch is lowercase, a blank, or some other special characters.

In the same way, you convert a number between 0 and 25 back to an uppercase letter with:

    ch = convertedCharacter + 'A';
Again, this will only work if convertedCharacter is 0 through 25. If it is some other value, it will not generate a valid letter.

Some of you will notice that the above code will work only on computers that use ASCII or a similar encoding. No problem; it's OK if your program only works on those computers.

Required Techniques and Data

For this assignment, you will be required to use a number of data structures, data elements, and techniques that are not a direct consequence of the external interface requirements. The purpose of these extra restrictions is to force you to get practice with a number of important C++ data structures and techniques.

Required Techniques

There are two obvious ways in which a simple encryption program might work. Both ways assume that you already have the password stored internally. The first approach is to read a single character at a time from the input file, encrypt it, and write it to the output.

The second approach is to read the entire input file into a giant internal string. After reading the input, you can iterate through the string, encrypt each character, and store the encrypted version back into the string (modifying the character in-place). Finally, you can write the encrypted string to the output. You must use this second approach in this assignment. This is partly because it will give you more practice in using iterators, and partly because it will make your code cleaner.

Required Data Structures

Your solution to this assignment must make use of a rather interesting string class that stores a string as a linked list of fixed-size "chunks," each of which is four characters long. Such a class is a compromise between storing the string as an array (which makes inserting characters expensive) and storing it as a linked list of single characters (which wastes memory).

A string is represented as a linked list. A small string would fit in a single list element. If the string is too large to fit into a single piece, you create a second piece and then tie it together with the first, using the linked list as the underlying structure.

To make the above design work, you will need to add an iterator to your list class from homework #7.

In case the above description is not clear, here is an example of the private data from my version of the structure that is used to store the individual pieces of the string:

    class Chunk
        {
        // ...
        private:
            unsigned int        length;
            char                value[CHUNKSIZE];
        };
where CHUNKSIZE is a constant giving the number of characters in each piece (i.e., 4). I can then declare the entire string as a list using something like List<Chunk> string in the private data section of my ListString class.

Operations in the String Class

To the outside world, your string class should just look like something that stores strings. The user should not be able to tell, based on the interface, that the strings are stored in pieces instead of as a single array of characters. This has two implications:

  1. You should support most of the "standard" operations, and
  2. Your private data should be private and not visible to the outside world in any way.

The exact set of operations you choose to support is up to you, and to some extent it depends on what your program needs. I found the following operations to be minimally necessary in my own implementation:

With the exception of the get-length function, I chose to implement all of the above as overloaded operators. In addition to the above, list, I implemented a number of functions, including general concatenation and all of the Boolean comparison operators, just in case I needed them. Your mileage may vary, of course.

Required Functions

The List Iterator

The string iterator required below must be built on top of a list iterator for your list class. To make that work, you will need a list iterator, of course. To save you time and unnecessary effort, we have provided a working list iterator in the form of a header file and icc file. This iterator is designed to work with the templated list class handed out in class (header, implementation). To make the iterator work, you will have to add the declaration:
    friend ListIterator<DATA>;
to the header file in two places: once inside the List class, (just before the public declaration is a good place) and once inside the nested Element class, just after the existing friend declaration for List.

Note that iterator classes are not normally kept in separate header files, because it is a bit of a nuisance to have to "#include" two header files to get both the class and the iterator. We have provided the source in this form only because we are giving the iterator separately. You may wish to copy the code into your own list class.

Also, if your list class isn't named List, or if your list-element class isn't List::Element, you will have to make appropriate changes in the iterator.

The String Iterator

The main reason you need a list iterator is so that you can build upon it to make an iterator for the string-as-list class. A string iterator is required for this assignment. Your string iterator should be built on top of the list iterator; it should have no direct knowledge of the structure of the list. I found it convenient to have a list iterator as a private data member in my string-iterator class; I called it subIterator because it is a subsidiary iterator that is hidden from the outside world.

When you are done building the string iterator, you should be able to do something like this (assuming your string class is called ChunkyString and the iterator is ChunkyStringIterator):

    ChunkyString stuff;
    // .. put characters into the string
    for (ChunkyStringIterator i;  i;  i++) {
        if (*i == 'a')
            cout << "I found an A in the string\n";
    }

Note that the user of the ChunkyString has no knowledge of the fact that there is a list hiding underneath. If your main program even mentions the List class or the ListIterator class, you have taken the wrong approach and you will loas points.

Some Samples

Here is a trivial sample input file, and the output it generates when encrypted with -g 5 and the pass phrase "cs fun". If that output file is fed back into the decryption routine, it generates a slightly modified version of the original. Note that the decrypted version has a blank at the end, which did not appear in the original. Why?

Encrypting a different input file with the pass phrase "My roommate never studies, why should I?" but no -g switch produces a single very long output line. The decrypted version of the file demonstrates that a certain amount of (presumably) useful information has been lost.

Finally, to make the assignment interesting, here is an encrypted file for you to decode once your program is working correctly. The file was encrypted with the pass phrase "When I get my program working I am going to get some sleep". (Note that there is no trailing period in the pass phrase; it consists solely of alphabetic characters and blanks.)

As usual, the sample source and input/output files can also be downloaded as a ZIP archive or gzipped tar archive.

Tricky Stuff

As usual, there are parts of this assignment that contain traps. Here are a few:


Other Resources

There is more information on using C++ on Turing available in the departmental quick-reference guide and the C++ quick reference guide.


This page maintained by Geoff Kuenning.