This is the main part of the project. Given four coarse coded images, this produces a numerical score of the likelihood of a face appearing in the image the coarse coded set was derived from.
There are two separate networks that achieve this task: one for each
of the training patterns. Two facial images are used as patterns for training:
and
.
The network never actually sees these images; it uses their coarse coded
representations. Each network is trained to return 1.0 if it sees the pattern
or -1.0 if it sees a solid black field. Of course the network will never
return those end values, but rather something in between. Each network
is a single tanh activated neuron with only third order weights. This third
order structure of the neurons ensures invariance to geometric transformations.
This is why it doesn't matter that one of the training patterns is tilted:
that information is ignored. When the neurons are trained (using a standard
delta rule) the neurons are ready to be used to face-detect images. The
images are ran through each neuron and the outputs of each neuron are added
together to yield the final score. Since each neuron returns values in
the range of -1.0 to 1.0, the total value ranges from -2.0 to 2.0.
This section of the program could be improved as well. The patterns need to be researched to find the optimal training images for this task. A different angular resolution could be used in the network possibly resulting in better recognition capabilities. Finally, it may be a good idea to weight the sum of the outputs of the two networks. Since more images may lean towards one pattern than the other, it doesn't make very much sense to add the two values equally.