CS 70, Fall 2000
Assignment 7: DNA Recombination

The program for this assignment, and everything else except the README file, is due at 9 PM on Wednesday, November 8th, 2000. As usual, the README file is due at 12 midnight on the same day (i.e., the moment Thursday starts). Refer to the homework policies page for general homework guidelines.

The primary purpose of this assignment is to get you used to writing C++ iterators. You will also be developing a preliminary list class. Both the list class and the iterator for it will be useful to you in later assignments.

Note: the list class you develop in this assignment will be central to several future assignments. Make sure you develop it well and debug it thoroughly.

Overview

One of the more creative approaches to artificial intelligence is the genetic algorithm, invented by Prof. John Holland of the University of Michigan.

In brief, a genetic algorithm simulates the process of evolution by applying the usual rules of genetics to simulate natural selection. In real life, natural selection's primary goal is the continuation of the species, and organisms that achieve that goal tend to be propagated. In a genetic algorithm, on the other hand, the primary goal is to satisfy a "fitness function" chosen by the programmer. For example, a simple fitness function might interpret the genes of an organism as the value of x in a complicated equation. The natural-selection process could then be tuned to prefer organisms that generate an output near zero, so that the survivors would eventually produce a solution to the equation.

Genetic algorithms were the first step in the current research area called "artificial life", and they have been used to successfully solve many problems that were otherwise intractable.

A complete genetic algorithm is too complex for a CS 70 assignment, but we can implement some of the core functions relatively easily. There are three basic processes in evolution: mutation, recombination, and selection. Mutation involves selecting a gene site and modifying it in some fashion, usually by replacing it with another gene. Mutation is very rare both in real life and in genetic algorithms.

Recombination is the most important process in generating new organisms. It involves taking two gene strings (usually from two parent organisms), cutting them both at the same point, and re-splicing them so that the head of the result comes from one parent and the tail from the other. Real genetic algorithms usually generate two children in this process, and may splice at more than one point, but we'll simplify things in our implementation.

The final step, selection, will not be implemented in this assignment. Selection involves evaluating the organisms according to some criterion (the "fitness function") and choosing the ones that are most successful. In real life, selection is the harsh process of "survival of the fittest." In a genetic algorithm, the same method is used: the least fit organisms are discarded (i.e., killed) without being allowed to reproduce. As in real life, there is some randomness, so that a somewhat unfit organism has a chance of surviving even when a more fit one is discarded. This randomness turns out to be important to the success of the method, since any two slightly unfit parents might (through recombination) generate an extremely fit child.

Because we will not have time to implement an entire genetic algorithm, we will limit ourselves to the recombination and mutation functions. However, we will build them using a relatively general approach, so that the code we write could later be incorporated into a larger program.

Data Structures

CharList

The gene lists will be represented using a linked list. Each element in the list will contain only a single character, plus a link to the next element. The list must have a separate header that is not a plain element, which means that you must implement two classes (the header and the element). The cleanest approach is to make the element a nested private class of the header, so that only the header (CharList) is visible from outside.

Your linked list must be named CharList and must support the following operations. Note that, since the main driver program is supplied, the function names cannot be changed.

In addition, you must implement an output operator (operator<<) for CharList. I suggest that you use the technique suggested in Weiss: provide a public print function, and have operator<< call print.

Finally, you may find it helpful to implement a few other standard list functions: pushHead, popHead, isEmpty, and possibly popTail. Some of these may be useful in this assignment, while others can be useful in the future. However, only the list above is absolutely required.

CharListIterator

You must also implement an iterator for CharList, which must be named CharListIterator. The iterator must support the following functions at a minimum:

In addition, you may wish to support a copy constructor, assignment operator, and postincrement operator. It would not be appropriate to implement operator->, since char is not a class.

What You Need to Build

You are provided with a single file, assign_07.cc, which is the main driver program. You are not allowed to modify assign_07.cc.

You must create the following files:

assign_07.cc
This must be exactly the file that you downloaded from this Web page.
Makefile
For this assignment, the Makefile will not be provided. You must write your own, and it must be correct. If you do not provide a Makefile, your program will not compile and you will receive a zero for functionality. Be sure your dependencies are correct; you may wish to use g++ -M to help.
charlist.hh
This file will contain the interface definition for the CharList and CharListIterator classes. Note that both classes must be defined by this file, either by placing both definitions in the file, or by having it #include whatever file(s) contain the remaining definitions.
*.hh
Any other header files that you feel are necessary to implement your code. (There is no requirement that there be any other header files, but you might find it useful.)
*.cc
Any other source files that you feel are necessary to implement your code.

Unlike in homework assignment #3, you are not required to use any specific coding style in the files you create. Since you are creating new files from scratch, any good style is acceptable.

As usual, you can also download the provided file as a bundle, either as a gzipped tar file or as a ZIP archive.

Submission Mechanics

For assignment 7, you must submit the following files:

Testing

The comments at the beginning of assign_07.cc give a number of sample test cases. I suggest that you run all of these tests yourself, as well as a number of others that you have concocted.

We will not limit ourselves to running only the tests given in the comments. You can expect that we will run stress tests in an attempt to break your program. I strongly suggest that you attempt to break it yourself, so that we won't be able to do so.

Tricky Stuff

As usual there are some tricky parts to this assignment. Some of them are:


This page is maintained by Geoff Kuenning.