PSRI Invariant Face Detection Using Third Order Neural Nets

Dmitriy Kogan
CS152 - Neural Networks
Professor R. Keller
Harvey Mudd College
Fall 2000
 

Proposal

Face detection is a useful in many applications. For example, a face identification program would want to find the face in the image before it tries to identify it; lip reading software would want to locate the lips before it could process the stream of images into text. Software that can do face detection exists, some neural based, and some not. However, most of these programs have serious limitations. Some use the facial color to find the face, which can be confusing because of shadows on the faces, and the different skin color of different people. These programs are also prone to false-positives because of skin-colored objects or other body parts. Many face detection programs don't handle geometric transformations, like rotations, very well. This project attempts to beat this problem.
 

Problem statement

To design and build a PSRI (position, scale, rotation invariant) face detection system. The output should be the coordinates of the found face in the input image.
 

Approach

Before anything involving neural nets was done, a lot of preprocessing had to be performed on the images. Each of the different components is described in detail in their own hyperlinked section. The 256x256 24-bit color uncompressed bitmaps, which serve as input to the program, were first filtered through a contrast filter. This converted the image to a 256x256 24-bit black & white uncompressed bitmap which had the outlines of the image in it. This image was then fed to a splitting program, which split the image into 59 different 64x64 24-bit black & white uncompressed bitmaps. Each of these was then fed to a coarse coding program. It generated 4 different 16x16 24-bit black & white uncompressed bitmaps from the original image. Finally, each of those, in sets of 4 was sent to the neural network to get a score of the likelihood that a face appears in the 64x64 image where the 4 lower resolution images were obtained. These scores can be ranked, and judging from the location in the original 256x256 image that the 64x64 pieces were obtained, the coordinates of the face in the original image can be found.
 

Results

The results were mostly positive. Given more fine-tuning time, I'm confident that they could be improved upon. The biggest problem was the number of false positives. While many images had the face as the number one rated image (the Courteney Cox image below, for example), some had the face at #10 (the bottom right image). Since there are a total of 59 images, even the latter isn't too bad. These problems would be alleviated with a better image splitting program, a better contrast filter, better training patterns, more angular resolution in the network, etc. Given the amount of fine tuning time that was put into the development of this code (almost none), the results are very good. After the network was fully trained, images were used to test the program. Here are the results from a few test runs (click on the images to see the ratings):





 

References

Human Face Recognition Using Third Order Synthetic Neural
Networks. O. Uwechue, A. Pandya.

Learning, invariance, and generalization in high-order neural
networks. C. Giles, T. Maxwell. Applied Optics. Vol. 26 No. 23. p. 4972

Coarse-Coded Higher-Order Neural Networks for PSRI Object
Recognition. L. Spirkovska, M. Reid. IEEE Trans. Neural Networks. Vol. 4,
No. 2. p. 276.

Encoding Geometric Invariances in Higher-Order Neural Networks. C. Giles,
R. Griffin, T. Maxwell. Neural Information and Processing Systems. p. 301.
 

Code

Here's the code directory. Compile all of the .cpp files and rename the executables to what the .cpp file was called. The trained weight files are included in the directory. To test the network on an image, run "./evalimage imagein.bmp". Note that the bitmap has to be 256x256, 24-bit and uncompressed (black/white or color) and the name of the file has to end in "in.bmp". After the script runs (takes ~90 seconds on an Athlon 800), a sorted list of images is output. Also, since the bitmap had to be preprocessed, many files are created. Look at the files in the output to see if they actually contain faces.