CS 152: NEURAL NETWORKS
Evolving a Sigma-Pi Network as a Network Simulator
Justin Basilico

[ Main | Problem statement | Approach | Results | References | Code directory | Presentation ]

APPROACH

Since the goal is to evolve a sigma-pi network that can simulate other networks, the input to the sigma-pi network is comprised of the inputs to the network that is being simulated along with all of the weights for the network being simulated. Since sigma-pi networks have a fixed input size, this means that any one simulator network can only simulate networks of a particular architecture (or networks that would fit in the architecture but do not use all of it). The goal of the sigma-pi network is to produce the output that the given network (in terms of weights) would produce on the given input to that network.

As it turns out, a sigma-pi network can be built to do the simulation such that the weight value between any two nodes when there is a connection between them is always 1.0. This means that only the connectivity of the sigma-pi network itself needs to be created, since the weight value can be assumed to be 1.0 when there is a weight there.

Since only the connectivity of the network needs to be created, it is simple to apply a genetic algorithm to evolve a chromosome that represents the network, which is just the connectivity of the network, similar to what Miller, Todd, and Hegde (1989) did. Specifically, the chromosome just needs to be binary values of 0 or 1, which forms the connectivity matrix between each layer in the network. For instance, if there is a layer in the network of $n$ nodes where the previous layer has m nodes (excluding the bias node), then you would need a binary string of length n * (m + 1) to encode the connectivity of that layer.

The general procedure of the genetic algorithm that operated on these chromosomes for the sigma-pi networks for a population of size n was that the fitness of each chromosome was calculated (smaller fitness being better) and then the chromosomes were sorted in increasing order according to their fitness. To create the next generation, first the top 5% of the current generation are copied over into the next generation (elitist selection). To create the rest of the next generation, two chromosomes are selected from the current population using a rank-based selection, where the rank is based on their location in the list when ordered according to fitness. Once these two are selected, a new chromosome is created by crossing the two over, where each component of the chromosome has a 0.1 (10%) chance of being crossed over to the other chromosome. Once the crossover is done, one of the children is arbitrarily selected to go into the new population. Once these chromosomes have all been created, each one is mutated by flipping each bit in the chromosome with a probability of 0.01 (1%). This mutation rate is relatively large, however it seems to work well with using the elite and rank selection in the program.

The fitness function applied to the network is just the average mean squared error that the sigma-pi network produced by the chromosome over a testing set of 100 random networks that it is to simulate. The error is computed by comparing the actual output with the target output that is the output of the random network it is simulating. In order to simplify the task, the network being simulated is assumed to have linear activation functions instead of sigmoid functions. Sigmoid make the fitness landscape a lot harder to deal with since the output values for networks with sigmoid units will center around 0.5. This makes it so that when you use the error as the fitness function that the network will want to just guess 0.5 since it gives a pretty low average error. Using a linear activation function gives more uniformly distribution of output values, where as a sigmoid function gives closer to a normal distribution. Although most networks do use non-linear activation function ssuch as sigmoid functions, the architecture of the sigma-pi network will be the same for any activation function, so they are all assumed to be linear in order to simplify the problem. Other fitness functions were also tried on this problem such as counting the number of incorrect outputs within some threshold while using sigmoid functions. While these did well on smaller networks, but on the larger network, it did not work at all. Using the linear activation functions and using the mean squared error was the fitness function turned out to be the best combination.

The random testing networks are generated with weight values randomly assigned to a value between -5.0 and 5.0. In addition, random input values are created in the range -1.0 to 1.0. The the network is just a simple fully-connected feed-forward network that uses only summation units, as is typical in networks such as backpropagation networks. The input to the simulator network consisted of the randomly created input values along with randomly created weight values. The target output that the simulator network is trying to achieve is the output that this random network being simulated gives on the random input values.

This procedure is applied to three different network architectures. The first architecture to be simulated is a network with two input units and one output unit. The sigma-pi network for this architecture uses five input units, three hidden units (pi), and one output unit (sigma). The second is a little larger with two input units and two output units. The sigma-pi network in this case has eight input units, six hidden units (pi), and two output units (sigma). The third network is a combination of the first two networks with two input units, one hidden layer with two units on it, and one output unit. The sigma-pi network for this architecture is larger, with eleven input units, three hidden layers of nine units (pi), five units (sigma), and three units (pi) with one output unit (sigma). After the genetic algorithm is finished on each of these tasks, the best network produced by the algorithm is then inspected and tested against another set of 100 random networks to see how good it has generalized.

[ Main | Problem statement | Approach | Results | References | Code directory | Presentation ]

This file is located at
http://www.cs.hmc.edu/~jbasilic/cs152/project/approach.html