Dmitriy Kogan
CS152 - Neural Networks
Professor R. Keller
Harvey Mudd College
Fall 2000
Proposal
Face detection is a useful in many applications. For example, a face
identification program would want to find the face in the image before
it tries to identify it; lip reading software would want to locate the
lips before it could process the stream of images into text. Software that
can do face detection exists, some neural based, and some not. However,
most of these programs have serious limitations. Some use the facial color
to find the face, which can be confusing because of shadows on the faces,
and the different skin color of different people. These programs are also
prone to false-positives because of skin-colored objects or other body
parts. Many face detection programs don't handle geometric transformations,
like rotations, very well. This project attempts to beat this problem.
Problem statement
To design and build a PSRI (position, scale, rotation invariant) face
detection system. The output should be the coordinates of the found face
in the input image.
Approach
Before anything involving neural nets was done, a lot of preprocessing
had to be performed on the images. Each of the different components is
described in detail in their own hyperlinked section. The 256x256 24-bit
color uncompressed bitmaps, which serve as input to the program, were first
filtered through a contrast filter. This converted
the image to a 256x256 24-bit black & white uncompressed bitmap which
had the outlines of the image in it. This image was then fed to a splitting
program, which split the image into 59 different 64x64 24-bit black
& white uncompressed bitmaps. Each of these was then fed to a coarse
coding program. It generated 4 different 16x16 24-bit black & white
uncompressed bitmaps from the original image. Finally, each of those, in
sets of 4 was sent to the neural network to get
a score of the likelihood that a face appears in the 64x64 image where
the 4 lower resolution images were obtained. These scores can be ranked,
and judging from the location in the original 256x256 image that the 64x64
pieces were obtained, the coordinates of the face in the original image
can be found.
Results
The results were mostly positive. Given more fine-tuning time, I'm confident that they could be improved upon. The biggest problem was the number of false positives. While many images had the face as the number one rated image (the Courteney Cox image below, for example), some had the face at #10 (the bottom right image). Since there are a total of 59 images, even the latter isn't too bad. These problems would be alleviated with a better image splitting program, a better contrast filter, better training patterns, more angular resolution in the network, etc. Given the amount of fine tuning time that was put into the development of this code (almost none), the results are very good. After the network was fully trained, images were used to test the program. Here are the results from a few test runs (click on the images to see the ratings):
References
Human Face Recognition Using Third Order Synthetic Neural
Networks. O. Uwechue, A. Pandya.
Learning, invariance, and generalization in high-order neural
networks. C. Giles, T. Maxwell. Applied Optics. Vol. 26 No. 23. p.
4972
Coarse-Coded Higher-Order Neural Networks for PSRI Object
Recognition. L. Spirkovska, M. Reid. IEEE Trans. Neural Networks. Vol.
4,
No. 2. p. 276.
Encoding Geometric Invariances in Higher-Order Neural Networks. C. Giles,
R. Griffin, T. Maxwell. Neural Information and Processing Systems.
p. 301.
Code
Here's the code directory. Compile all of the .cpp files and rename the executables to what the .cpp file was called. The trained weight files are included in the directory. To test the network on an image, run "./evalimage imagein.bmp". Note that the bitmap has to be 256x256, 24-bit and uncompressed (black/white or color) and the name of the file has to end in "in.bmp". After the script runs (takes ~90 seconds on an Athlon 800), a sorted list of images is output. Also, since the bitmap had to be preprocessed, many files are created. Look at the files in the output to see if they actually contain faces.