The program for this assignment, and everything else except the README file, is due at 9 PM on Wednesday, March 28th, 2001. As usual, the README file is due at 12 midnight on the same day (i.e., the moment Thursday starts). Refer to the homework policies page for general homework guidelines.
The primary purpose of this assignment is to get you used to writing C++ iterators. You will also be developing a preliminary list class. Both the list class and the iterator for it will be useful to you in later assignments.
NOTE: the list class you develop in this assignment will be central to assignments 8 and 9. MAKE SURE you develop it well and debug it thoroughly. If you blow this assignment off, you will do poorly on the following two assignments as well. A correct solution to this assignment will NOT be distributed to the class. It is your responsibility to be sure you have a properly working list class and iterator.
One of the more creative approaches to artificial intelligence is the genetic algorithm, invented by Prof. John Holland of the University of Michigan.
In brief, a genetic algorithm simulates the process of evolution by applying the usual rules of genetics to simulate natural selection. In real life, natural selection's primary goal is the continuation of the species, and organisms that achieve that goal tend to be propagated. In a genetic algorithm, on the other hand, the primary goal is to satisfy a "fitness function" chosen by the programmer. For example, a simple fitness function might interpret the genes of an organism as the value of x in a complicated equation. The natural-selection process could then be tuned to prefer organisms that generate an output near zero, so that the survivors would eventually produce a solution to the equation.
Genetic algorithms were the first step in the current research area called "artificial life", and they have been used to successfully solve many problems that were otherwise intractable.
In this assignment, we will create a program that uses a genetic algorithm to find approximate square roots of integers. Although it is simplified compared to a production implementation, the program demonstrates the basic outline and capabilities of a genetic algorithm.
There are three basic processes in evolution: mutation, crossover, and selection. Mutation involves selecting a gene site and modifying it in some fashion, usually by replacing it with another gene. Mutation is very rare both in real life and in genetic algorithms. Crossover is the most important process in generating new organisms. It involves taking two gene strings (usually from two parent organisms), cutting them both at the same point, and re-splicing them so that the head of the result comes from one parent and the tail from the other. Real genetic algorithms usually generate two children in this process, and may splice at more than one point, but we'll simplify things in our implementation.
The final step, selection, involves evaluating the organisms according to some criterion (the "fitness function") and choosing the ones that are most successful. In real life, selection is the harsh process of "survival of the fittest." In a genetic algorithm, the same method is used: the least fit organisms are discarded (i.e., killed) without being allowed to reproduce. As in real life, there is some randomness, so that a somewhat unfit organism has a chance of surviving even when a more fit one is discarded. This randomness turns out to be important to the success of the method, since any two slightly unfit parents might (through crossover) generate an extremely fit child.
Because we will not have time to implement an entire genetic algorithm, much of the code has been provided for you. You must supply the underlying data structure (a linked list), and must also write the two small functions that perform mutation and crossover.
An organism will be represented entirely by its gene sequence, which
in turn will be represented using a singlylinked list. Each element
in the list will contain only a single integer from 0 to 9
(represented by the C++ type
int), plus a link to the
next element. The list must have a separate header that is not a
plain element, which means that you must implement two classes (the
header and the element). The cleanest approach is to make the element
a nested private class of the header, so that only the header
IntList) is visible from outside.
You are not allowed to use a doubly linked list in this assignment.
Your linked list must be named
IntList (so that it can be
used by the main driver program) and must support
the following operations. Note that, since the main driver program is
supplied, the function names cannot be changed.
const int *array and an integer length for that array, and creates an
IntListthat has been initialized with the contents of that array, with one integer per list element. The declaration of this constructor should look like:
IntList(const int* source, int length);
pushTailfunction that inserts a single integer at the tail of the list. The declaration of this function should be similar to the following:
void pushTail(int value);This function must operate in O(1) time, which implies that you must maintain a separate tail pointer for the list. You have already done a similar implementation in CS60.
In addition, you must implement an output operator
IntList. I suggest
that you use the technique suggested in Weiss: provide a public
Finally, you may find it helpful to implement a few other
standard list functions:
isEmpty, and possibly
popTail. Several of
these functions will be useful in future assignments, and you will
find it much easier to do those assignments if you implement the
functions now, while your list class is simple, rather than waiting
until later when you have converted it into a templated class.
However, only the list above is absolutely required.
You must also implement an iterator for
must be named
IntListIterator. The iterator must
support the following functions at a minimum:
IntListto be iterated over.
operator boolthat returns
trueif the iterator can deliver more data, or
falseif the iterator is expired.
operator*that returns a
int&(so that the integer in the current position can be modified if necessary).
In addition, you may wish to support a copy constructor, assignment
operator, and postincrement operator. It would not be appropriate to
int is not a
You are provided with a single file,
is the main driver program. You are not allowed to modify
assign_07.cc except by adding code at the "ADD STUFF"
You must create or modify the following files:
Makefilewill not be provided. You must write your own, and it must be correct. If you do not provide a
Makefile, your program will not compile and you will receive a zero for functionality. Be sure your dependencies are correct; you may wish to use
g++ -Mto help.
IntListIteratorclasses. Note that both classes must be defined by this file, either by placing both definitions in the file, or by having it
#includewhatever file(s) contain the remaining definitions.
assign_07.cc, you must provide the functions that
perform genetic mutation and crossover. The mutation function
modifies its list argument in place, changing a single gene at a
specified position (0-indexed). You must use your iterator
for access to the list. The crossover function creates (and
returns) an entirely new list, choosing each gene from one of the two
parents depending on the
position argument. Again, you
must use your iterator to access the parent lists.
The places where you need to provide code are marked by "ADD STUFF" comments.
assign_07.cc is provided to you, you must maintain
stylistic consistency in that file. However, you are not required to
use any specific coding style in the
other files that you create. Since you are creating them from scratch, any
good style is acceptable. In particular, you do not have to
match the style of assign_07.cc in those files.
As usual, you can also download the provided file as a bundle, either as a gzipped tar file or as a ZIP archive.
For assignment 7, you must submit the following files:
Testing is your responsibility. We will not provide exact test cases for you. You should test your program a number of times, under different conditions.
In its default condition, the program is nondeterministic (i.e., two successive runs may produce different results). To make testing easier, the program accepts a switch that makes it deterministic. If you use "-S n", where n is an integer, the random seed will be set to that value. Specifying the random seed will allow you to control the program's behavior so that you can reproduce bugs.
You will also find it instructive to run the program with the
-d switch, and to run it for many
different values of the
Judicious reading of the comments, together with experimentation, will
reveal the purpose of these switches and how they interact.
We will not limit ourselves to running only simple test cases. You can expect that we will run stress tests in an attempt to break your program. I strongly suggest that you attempt to break it yourself, so that we won't be able to do so. In particular, make sure you ask it to find the roots of a lot of numbers, all on one command line.
To make it clearer how the program is used, here are some sample runs. First, we can approximate the square root of 2000000 (which is just 1000 times the square root of 2). The "%" represents the command prompt.
% ./assign_07 -S 12345 2000000 0001414 * 1414 = 1999396If we start with a different random seed, we get a different result:
./assign_07 -S 54321 2000000 % 0001415 * 1415 = 2002225A third attempt gives a pretty bad answer:
% ./assign_07 -S 1 2000000 0000989 * 989 = 978121Finally, we can change the number of generations (
-g), the mutation rate (
-m), the population size (
-p) the selection pool size (
-s, which should be smaller than the population size), and the number of randomly-chosen survivors (
-r, which should usually be pretty small), and run with debugging (
% ./assign_07 -S 1 -g 100 -m 0.1 -p 100 -s 50 -r 3 -d 2000000 Generation 0: 0003616 Generation 1: 0001993 Generation 5: 0001912 Generation 7: 0001501 Generation 11: 0001413 Generation 22: 0001414 0001414 * 1414 = 1999396
Note 1: the running time of the program is O(population size * number of generations). Don't use huge numbers or you'll wait all day!
Note 2: If you don't specify the
-S switch, you will get
different results every time you run the program. That's a feature,
not a bug.
Note 3: The defaults are:
As usual there are some tricky parts to this assignment. Some of them are:
assign_07.ccbefore you start, so that you understand the requirements placed on the
crossover; it must return by value or it won't work correctly.
IntListdestructor, copy constructor, and assignment operator are working before you try to run the main program. Getting these functions right can be quite difficult, and if you don't debug them in isolation, you will experience strange bugs that will be hard to find.
operator*) must return an integer by reference (
int&). Otherwise you won't be able to get the mutation operator to work.
pushTailmust run in O(1) time. Be sure to do a careful complexity analysis of the function to be sure that it's not O(N).
© 2001, Geoff Kuenning
This page is maintained by Geoff Kuenning.