Table of Contents
One of the problems with neural networks is that they only handle one type of problem. For some problems, it may be difficult to make the network general enough. In general, the one size fits all approach does not work very well. For example, consider the case of driving a car. Driving in snow is different then driving on a freeway, which is different then driving on surface streets, etc.
Instead of having one backprop network trying to accomplish everything, my project will use a LVQ network to select the situation and then apply a backprop network. Instead of the LVQ selecting a final class, it will select a network and pass the input to the backprop network. If necessary, there may be a transformation of the input between the two networks.
One of the complications is that instead of simply picking the class and learning from that, the LVQ must run the input through each of the networks to determine the error. Therefore, one could either use the typical measures (Euclidean Distance, Hamming Distance, etc.) associated with LVQ, or one could use the MSE for each of the networks to form the clusters.
Initial ideas on the project
covered training and specific goals relating to the
training. After the proposal was submitted, the training model
changed because the suggestions in the original proposal had
serious flaws.
Problem Statement
To design and implement an network that consists of a Learning Vector Quantization (LVQ) and a number of back-propagation networks. The LVQ network will direct input to one of the back-propagation networks. The network should read it's configuration from files. The number of weights and the structure of the back-propagation network should be configurable, as well as the learning rates, etc.
The results from this hybrid network will be compared to the
output from a standard back-propagation network.
Approach
In general, the approach is to break the input space into smaller regions and then to solve each one of these individually. This set is accomplished using the LVQ network, which maintains n weight vectors. Each input is compared to the closest weight vector and the weights are adjusted so that they represent the inputs. The question is how the distance is defined. Originally, the squared error of each back-propagation network was considered as the distance. The network with the lowest error was closest to the input. However, this only works during training because it requires a knowledge of the correct output. Therefore, the squared error was discarded as a measure of distance. Instead, the inner product of the input vector and the weight vector was used as the distance measure.
The network implementation itself utilized the back-propagation network written earlier in the semester and allows other distance measures to be incorporated. The network can be configured so that it is a simple LVQ network instead of the hybrid network. Additionally, other types of networks could be incorporated. The LVQ portion of the network only contains a classification type for each weight. This type may contain a back-propagation network or some other type of network.
Since there are two networks that must be trained simultaneously, training can be difficult. The LVQ network implements the LVQ2 algorithm which differs from LVQ1 in that it always moves the nearest correct weight towards the input. In both, the closest weight to the input is adjusted. If it correctly classifies the input, then it is moved closer to the input, but if it incorrectly classifies the input, it is moved away from the input.
When using the hybrid network, the idea of correct and incorrect classification becomes less clear. In this case, an input is correctly classified by a weight if the output of the associated back-propagation network matches (within some tolerance) the desired output. If no such network exists, then the network with the lowest error is considered the correct network.
Training cycles through each of the inputs and compares then to the weight vectors in the LVQ network. For each input, the software remembers which input matched which weight. In other words, the software remembers the partitioning of the input space. Once all of the inputs have been processed once, each partition is used to train the back-propagation network that corresponds to the associated weight vector. The back-propagation network is only trained for a limited number of epochs. This is done so that the network does not completely learn the inputs because as the LVQ network learns, the partitioning of the input space will change.
Feedback for hybrid network during training
Results
The hybrid network was compared with a back-propagation network on four sample inputs. The first input consisted of two inputs and two output. The inputs were arranged around a unit circle and the two output were functions of the input. Each quadrant received a different function, as shown below.
Input Definition
The other three data sets were the 7-segment display data, a single sin wave, and the cancer data file.
| LVQ-BP Config | BP Config | LVQ-BP MSE | BP MSE | Best Type | |
| Circle | 4 x purelin 2 | logsig 3, purelin 2 | 0.00091 | 0.00278 | LVQ-BP |
| 7 Segment | 4 x (logsig 3, hardlim 10) | logsig 4, hardlim 10 | 0.0 | 0.0 | BP |
| Cancer | 4 x (logsig 3, hardlim 1) | logsig 9, hardlim 1 | 0.163 | 0.0315 | BP |
| Sin | 3 x (tansig 3, purelin 1) | tansig 4, purelin 1 | 0.849 | 0.006 | BP |
In each case, some parameter tuning was performed to get the networks to perform better. The hybrid network worked quite while on the circle data, but it did not work as well on the others. For some reason, performance on the Sin wave was terrible. Both the LVQ-BP network and the BP network obtained 0 error, but the back-propagation network did it faster and with fewer nodes, so it was picked as being the better performer. The BP network used was the one located in /cs/cs152/backprop/ on turing.
In the case of the circular data, the hybrid network outperformed the BP network, but in all other cases, the BP network produced a lower error rate. The most likely reason is that in the first case, the inner product information is actually relevant in the data because similar types are grouped together. On the other hand, the data from the other three data sets do not have an underlying geometric structure that the LVQ network can pick out. Trying to use distances in the input space probably makes the problem even more difficult because it splits the input into multiple segments passed on artificial criterion.
When there is a geometric grouping, the hybrid network performs quite well. This means that in some cases where something is known about the structure of a problem, splitting the problem based on that criteria can improve the overall performance. However, the inputs must have a pattern that matches the network splitting the data. A good example of what can go wrong is the Sin data set. A sin wave definitely has geometric pattern in input space, but it is not a pattern that the LVQ network can pick out. As a result, the performance of the hybrid network on the sin data set is abysmal.
Overall, the hybrid network works for specific situations and shows
that problems can be split up by neural networks.
References
K. Asanovi & N. Morgan, "Exprimental Determination of Precision Requirements for Back-Propagation Training of Artificial Neural Networks," TR-91-036, Internation Computer Science Institute, Berkeley, 1991.
Y. Bengio, Y LeCun, CNohl, & C. Burges, "LeREc: A NN/HMM Hybrid for
On-Line Handwriting Recognition," Neural Computation, Vol. 7,
pp. 1289-1303, 1995.
Code directory
Presentations