Applying a Genetic Algorithm to Find
the Optimal Weights of a
Recurrent Neural Network

A final project for CS152 by Tselil Schramm

Introduction

This project set out to explore the application of genetic algorithms to the weights of a neural network as a substitute for training algorithms in recurrent neural networks. Training methods for Recurrent Neural Networks (RNNs) are especially interesting because the vanishing gradient problem tends to make backpropagation unusable. Many training methods have been applied to RNNs, including Backpropagation Through Time and Simulated Annealing to name a few (see [4]). Each method comes with its own set of drawbacks, so alternative methods are an active area of research. Genetic algorithms have already recieved some attention in this area; for examples, see references.

To test and demonstrate the capabilities of a RNN trained with a genetic algorithm, I chose to use the algorithm to find the optimal weights for balancing a pendulum on a cart.

The final goal of the project was to evolve a network that could balance a pendulum by outputting the force to apply to the cart at each step, given only the angle of the pendulum as input.

Network Architecture

In order to let the algorithm find the absolutely optimum solution, very few presuppositions were made about the structure of the network. All networks started as fully connected directed graphs, though some weights were randomly set to zero so that lack of connection would be present in the gene pool. The idea was that if lack of connection gave a network a fitness advantage, it would be selected for evolutionarily. The network did follow a general forward-propagation; there was a predetermined order in which the node outputs were calculated based on the nodes they were connected to, so that some of the nodes were sharing their previous time step output and some nodes were sharing their current time step output.

For example, in the figure above, a single forward propagation begins with node 1. The black arrows represent connections in the forward propagation direction, and the blue arrows represent feedback connections. The output of node 1 at timestep t is an input to nodes 2 and 3 at timestep t, and the output of node 2 at timestep t is an input to node 3 at timestep t. At timestep t+1, the output of nodes 1,2, and 3 from timestep t are all inputs to node 1, while only the output of nodes 2 and 3 from timestep t are inputs to node 2 at timestep t+1, etc. This can of course be genralized to larger networks.

Again, the supposition was that if having layers was advantageous, then the weight of the input between two sequential nodes would evolve to zero, so they would not rely on each other's current time step output. For example, in the figure above, we can imagine that setting the black arrow between nodes 1 and 2 would eliminate the time t input to node 2; this is equivalent to having nodes 1 and 2 in the same layer. This ended up happening at least once in the fittest anlge-input-only network, as seen below.

All nodes in the network had a bias and the external inputs as inputs to the node. The one restraint imposed on the networks was that the output node was consistently chosen to be the second-to-last node in the forward-propagation; this was an accidental choice based on an indexing error that was not caught until very late in the project. Though originally the intention was that the last node in the forward-propagation would be the output, the networks were fully functional with the second-to-last node as the output node, so there did not seem to be a need to correct the mistake.

Modeling the Inverted Pendulum

One of the initial challenges faced was accurately modelling the motion of an inverted pendulum. In [2], I found equations of motion for the pendulum and more importantly the suggestion to use Euler's method to model them linearly. The equations for the second derivatives of angle and position at time t are as follows:

,
,

And the angle, angular velocity, position, and linear velocity are approximated numerically by Euler's Method:

where τ is the time step size.

As it turns out, there is an exact solution to this problem which depends on the angle, angular velocity, x coordinate and the velocity of the cart:

where the k_i's are constants dependant on the system.

This means that given all four inputs, it is possible to balance the pendulum with a single neuron [3]. However, the challenge of this project is to balance the pendulum using only the angle or only the angle and angular velocity as inputs, allowing the network to exploit its feedback mechanisms to deduce the angular velocity in the first case and take into account its position and velocity of the cart in both cases using the current state. Using only the angle as an input is a goal is discussed in [3], and this source acknowledges that this is a challenging problem, though the authors report very good results.

Genetic Algorithm Setup

Organisms

Each organism's genome was a list of floats representing the weights of connecting edges between each node in the RNN; because the weights are not bi-directional, each list of weights contains a weight for the edge (i,j) and the edge (j,i) for every node i and every node j. The starting population in the algorithm was generated by creating a random genome as follows:
For each connection between two nodes:

  1. Choose a random float from the interval (0,1)
  2. Multiply with equal probability by -0.5, 0, or 0.5
As mentioned previously, zero-valued weights are equivalent to absent edges; zero-valued weights were probabilistically introduced into the gene pool in step 2 so that the possibility of an absent connection would be present and subject to selection. Weights were chosen in a range of (-0.5,0.5) because it was discovered that this initial range of weights yields the most rapid convergence.

Mutation

Mutation was performed by probibalistically selecting between 1 and 3 locations in the genome and adding a random number in the range (-1,1) with increments of 0.1 to the weight at that locus.

Recombination

I implemented and tested two kinds of recombination operators. The first was traditional uniform crossover, in which several indices are chosen identically from the genomes of the parents, and are then swapped. I found that randomly choosing about half the genes and swapping those genes with 0.5 probability was most effective; an expected 25% of the genes were swapped each time, but the actual percentage varied between 0% and 50%. Using a lower expected percentage did not produce children different enough from the parents, slowing down the evolutionary process.

The second method I implemented was similar to uniform crossover, but instead of swapping genes at the selected loci, the average value of the genes at that location was inserted into the genome. Although this is not a standard genetic operator, my idea was that if both parents are fit, a value between two weights at a particular edge was likely to be optimal.

After testing both operators, I decided to use uniform crossover as my recombination operator, because I found that the second method quickly eliminates genetic diversity in the population; genetic diversity came only from mutations, which made for very slow evolution.

Fitness Function

Initially, the fitness function I used measured the number of time steps between the time at which the angle of the pendulum first arrives at 0 degrees until the angle dips below 90 degrees. This fitness function was not very effective for several reasons. For one, it does not allow the network enough time to establish balance before beggining to evaluate the fitness. Also, there is no penalty for larger angles; a network that manages to keep the pendulum between -90 and 90 degrees for the entire time interval is equal in fitness to a network that balances the pendulum perfectly.

In [1], I found a fitness function from a similar application that looked promising. The author waits a number of time steps for the network to come to equilibrium, then sums the number of steps for which the network is able to keep the pendulum between -90 and 90 degrees, but subtracts at each time step a penalty for the position of the pendulum away from equilibrium:

I implemented a variation of this fitness function, in which I imposed a penalty based on the angle rather than the x and y coordinates of the pendulum:

I also started evaluating fitness after 100 timesteps rather than immediately following the first occurrence of θ = 0, to allow the network to stabilize before penalizing for lost balance.

Evolutionary Process and Parameters

When performing evolution, I used between 7 and 10 nodes, as [1] seemed to indicate that more than 5 hidden-layer neurons were necessary (though this does not really apply to my model) and [3] seemed to indicate that 10 nodes were employed successfully for balancing the pole with angle-only input. Each organisms's fitness was evaluated with initial pendulum angles of ±10, ±20, ±60, and ±100. The fitness was evaluated starting at timestep 100 and until balance was lost or 200 additional timesteps had passed.

I initially experimented with the mutation rate and survival rate, and found that the best results were given by a mutation rate of 0.6 and a survival rate of 0.4. However, I do not have rigorous experimental results showing that these are the optimal parameter values, because each run of the genetic algorithm was highly time consuming and I did not have the time in the scope of the project to run extensive tests.

I also found that a population size of 75-100 was most successful in consistently yielding high-fitness organisms. Smaller population sizes sometimes yielded highly fit organisms, but sometimes the organisms froms small populations evolved very slowly. I concluded that a small number of organisms restricts the gene pool too much and makes fitness values too erratic and more easily dominated by a single organism, so I chose population sizes in the range of 75-100, though the larger population took far longer to compute.

I did succeed in performing 5 experimental runs in which I evolved networks of sizes 5,7, and 10 nodes in population sizes of 25, 50, and 75, for 100 generations, the results of which are summarized in the table below.

Population Size Number of Nodes Average % Fitness Minimum % Fitness Maximum % Fitness
255
7
10
12
17
12
5
8.5
0.7
17
20
25
505
7
10
13
24
20
1
2
17
21
55
21
755
7
10
16
20
17
2
19
3
20
21
20

Though there are only 5 data points for each category, it is clear that the genetic composition of the initial population is very important in determining the fitness of organisms 100 generations later. This is especially evident from the spread between the maximum and minimum fitness in each case; runs under identical conditions can produce in one case organisms with only 2% fitness, but in another case 55% fitness. Other apparent trends would need more data for support, but the maximum and minimum values can only increase in their maximality as more runs are added.

This seems to suggest that in order to evolve a really fit network without modifications to the algorithm, many runs of the algorithm with a relatively large initial population are necessary.

Best Techniques

I found that from running the genetic algorithm for as many as 500 generations, the highest fitnesses achieved were on the order of 60% of the total possible fitness. I decided to interfere in several ways to try to induce the evolution of fit organisms, some of which were more successful than others. These techniques are summarized below:

  1. Seed the population with known fit organisms - I decided that if known fit organisms were present in the initial population, fit organisms were more likely to emerge later in evolution. Thus, I tried running evolution with an initial population consisting of randomized organisms with previously discovered fit organisms introduced. This was somewhat successful, but often the fit organisms would take over the gene pool rapidly, potentially locking the algorithm in a local minimum. I hypothesize that if known fit organisms were added to the population later in the evolutionary progress, when the population already has its own fit organisms, we may actually see benefits. However, I have not yet had the chance to test this method out.
  2. Mutant Attack - When I saw the genetic diversity in the population dropping, I would switch to a higher mutation rate for 5-10 generations in the hopes of introducing new genes to the pool. This method was effective, but irritating to implement because I had to be present in order to enact it. A future direction for this method may be to build a diversity check into the algorithm, automating the process of increased mutation rate. However, when I performed this method by hand, I consulted my "gut feeling" based on the progress I had seen in the network over a long period of time; the number of generations for which the organism was dominating, the speed with which its genes proliferated, and the stage in the evolutionary process all played a role. I was also able to stop the mutant attack and change to a lower mutation rate if it appeared to be doing more harm than good. In order to implement this to happen automatically, extensive testing would have to be done to determine when a mutant attack is a good idea, so the "gut feeling" can be replaced by an algorithm.
  3. Gradually decrease the angle tolerance - I found that if I started with an angle tolerance that was too low (say, ±10 degrees), no fit networks would evolve at all. Instead, starting with a tolerance of ±90 degrees and gradually decreasing this tolerance as fit organisms emerged proved to be a very effective method, by which I found the fittest organisms for both the single- and double-input cases.

Results

Angle Input Only

Directly below is the schematic of the fittest network produced that takes only angle as input. Its evolution took about 500 generations. It is capable of balancing the pendulum within about ±45 degrees for extended periods of time given starting angles between -60 and 60 degrees


Note that nodes 1 and 3 are arranged in a "layer."

Videos of this network and its earlier ancestors balancing pendulums are shown below.

Fittest network starting at 30 degrees:



Fittest network Starting at 45 degrees:



Fittest network starting at 60 degrees:



Fittest network starting at 80 degrees:

This video demonstrates that the network cannot balance when then pendulum starting position is at 80 degrees. This is true for all angles larger than 60 degrees.

It is interesting to compare the fittest network to an earlier ancestor of the network, as seen below. The amount of fitness acquired in just 100 generations is very impressive!



Fittest network ancestor starting at 30 degrees:


Angle and Angular Velocity Inputs

Below is a diagram of the fittest network evolved that takes in angle and angular velocity as inputs. It took about 600 generations to evolve. It is capable of completely balancing the pole given any starting angle.

Videos of this network and its earlier ancestors balancing pendulums are shown below.

Fittest network starting at 30 degrees:



Fittest network starting at 70 degrees:



Fittest network starting at 180 degrees:



Again, we compare the fittest network to an earlier ancestor of the network.


Fittest network ancestor starting at 80 degrees:



Discussion

The fitness of the network with angle and angular velocity as input is obviously much greater than the angle-only input network. Clearly, balancing with a single input is a harder problem. However, it is possible that the fitness of the single-input network can be further improved in successive generations of evolution; the difference in the fitness of the fittest network and its ancestor is encouraging to this end.

It is interesting that for both fittest networks, the network architechture is probably very different from anything a human being whould have designed. Although it appears that the graphs are fully connected, there are several connections missing. Furthermore, there does not seem to be a high occurrence of layering; only in the angle-only instance were the nodes better formed in a layer, and even in this case there is only one "layer" in a graph of 10 nodes.

Conlusions and Future Directions for Research

The genetic algorithm was able to evolve networks that balance the pendulum relatively well; this is especially true for the angle and angular velocity input network, which is able to completely stabilize the pendulum starting from any angle between 0 and 360 degrees. The angle-only input network also achieved a relatively impressive level of fitness, given the acknowledged difficulty of the problem in the literature (see [3]).

I suspect that my results could be improved in a variety of ways. In references that I read, several modifications to the genetic algorithm and the network architecture were suggested that I found intriguing, but was unable to pursue due to time constraints. Furthermore, suggestions from my peers and professor and queries that have arisen in my research are also worth investigating. They include:

  1. Using two outputs on the angle-only input networks, one of which aims to guess the x-coordinate of the cart and feeds back into the network. This reportedly reduced the number of generations for evolution to 30, as reported in [3].
  2. Using a variation on the genetic algorithm to create more selective pressure. For example, in [1] Tournament Selection and Steady State Genetic Algorithms are discussed as successful optimizations for the genetic algorithm.
  3. Running more extensive experiments to find the optimal algorithm parameters, such as mutation rate, population size, survival ratio, number of nodes, etc.
  4. Successfully implementing the tampering methods mentioned above so that they become an algorithmic and more quantitative process.
  5. Exploring different mutation operators to see if they contribute to genetic diversity more meaningfully.
  6. Training with one initial angle at a time (suggested by Professor Keller). In practice, it seems that networks were quicker to learn how to balance the pendulum with a starting degree of ±10. It would be useful to exploit this and start training the networks to only balance with starting angles of magnitude 10 degrees, then slowly increasing the initial angle. This would greatly decrease the runtime of the fitness function.

References and Acknowledgements

I would like to acknowledge Professor Keller for his advice and ideas throughout the project.

[1] Syed, Omar. "Applying Genetic Algorithms to Recurrent Neural Networks
      for Learning Network Parameters and Architecture." Masters Thesis,
      Department of Electrical Engineering, Case Western Reserve
      University. (1995)
 
[2] Jason Browlee. "The Pole Balancing Problem". Technical
      Report. Centre for Intelligent Systems and Complex Processes,
      Swinburne University of Technology. (2004)
 
[3] Alexis P. Wieland. "Evolving Neural Network Controllers for
      Unstable Systems." IJCNN-91-Seattle International Conference
      on Neural Networks
. July 1991.
      <http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=155416>
 
[4] S. Haykin. "Neural Networks: A Comprehensive Foundation."
      Prentice Hall, New Jersey, 1999. (pg. 558, 750)
     

Code

Below is a link to my python code for the evolution of network weights and the visual simulation of a pendulum on a cart. VPython is required for running the simulations.

GeneticRNN.py

Instructions for performing evolution or running a demo can be found at the top of the code file.

It should be noted that others experienced difficulties when attempting to perform evolution on their computers. No problems have been experienced when running the simulations on other computers. It seems as though the problem has at present been corrected, however, this has only been tested on one computer other than my own.