Project Description



Optical Character Recognition is the process of converting a digital image of text into a corresponding digital representation of that text.

Applications include:
We endeavoured to create a neural network based OCR system that runs on Android phones to solve simple mathematical expressions involving digits 0 - 9, addition and subtraction. The program accepts an input image, normalizes and extracts characters from the image, attempts to identify each character, and tokenize then solve the equation. Using a network with two hidden layers trained with scaled conjugate gradient backpropogation we achieved over 96.5% successful classification rate.

Method



We broke this problem into three parts: image processing, character recognition with a neural network, and equation solving.
Image processing
In order to have good inputs into our neural network - characters from images must first be isolated and extracted from the image and normalized for color, size, and position. With our focus on the character recognition aspect of the project we used a relatively naive form of image processing.

Original Image

Cropped and converted to black and white

We crop and remove color in the same step. All pixels below a predefined grayscale threshold are converted to white, the rest black. Starting from the outside of the image, rows/columns are deleted until the row/column contains a black pixel.

Character extraction

Scanning across columns of an image, when the first column containing a black pixel is found the character start. When the next column containing only white pixels is found, the character ends. Red indicates start of character, green indicates end. There are many false positive here but they will be filtered out because they are too narrow. The big assumption we make here is that characters are not overlapping.



=>



=>

Character normalization

Each character is recropped and centered in a square frame. A 16 by 16 grid is overlaid onto the character. If a box contains more black pixels then a predefined threshold, the corresponding pixel in the final 16 x 16 image is set to black.

Output

Finally the character is transformed into a 16 by 16 boolean array for input into the neural network.

Character recognition
We used a 3 layer feed forward network trained with conjugate scaled gradient backpropagation. Training data was a combination of 163 pixels we gathered by soliciting equations written on a whiteboard and 1593 preprocessed characters from the UCI Machine Learning Repository. Inputs were a 16 by 16 boolean array and outputs were encoded as 1 by 12 array one hot encoding of the digits 0 through 9 and the symbols + and -.

Training was done using the Matlab neural network toolkit. We experimented with different sizes of networks, training functions, input sizes, and transfer functions.

Our best results were had using a 65 x 58 network with 12 outputs neurons. Similar to some of our sources - we found scaled conjugate gradient training much more effective and faster then Levenberg-Marquardt backpropagation training. Similarly we achieved better results using the logsig transfer function as opposed to the tansig transfer function.

Conjugate scaled gradient training
Levenberg-Marquardt 30x30 network training

Results



70% of the training set is used for training, 30% for validation and testing. Training runs for a maximum of 1000 epochs. Final success rate is the percentage of correct classifications on the entire data set averaged over 10 runs.

Training set Network size Training method Success %
Ours 15 x 15 SCG 91.2
Ours 25 x 25 SCG 93.5
Ours 65 x 58 SCG 95.3
Ours 100 x 85 SCG 94.1
Ours 15 x 15 LM 89.63
Ours + UCI 30 x 30 LM 60.9
Ours + UCI 15 x 15 SCG 90.9
Ours + UCI 25 x 25 SCG 93.7
Ours + UCI 65 x 58 SCG 96.6
Ours + UCI 100 x 85 SCG 97.35
Ours + UCI 50 x 45 x 25 SCG 96.05
Ours + UCI 70 x 25 x 25 SCG 95.73

We successfully ported our image processing code along with our neural network onto the Android platform. The application accepts a picture and attempts to solve the given equation. The app is low on features but serves as a proof of concept.

==>

Code



Sources



Matan, O.; Baird, H.S.; Bromley, J.; Burges, C.J.C.; Denker, J.S.; Jackel, L.D.; Le Cun, Y.; Pednault, E.P.D.; Satterfield, W.D.; Stenard, C.E.; Thompson, T.J.; , "Reading handwritten digits: a ZIP code recognition system," Computer , vol.25, no.7, pp.59-63, July 1992

Shoeb Shatil, Adnan Md. Research Report on Bangla Optical Character Recognition Using Kohonen Network. Tech. Center for Research on Bangla Language Processing, n.d. Web. http://www.panl10n.net/english/final%20reports/pdf%20files/Bangladesh/BAN26.pdf.

Sloss, Steven. "Handwritten Equation Intelligent Character Recognition with Neural Networks." Harvey Mudd College, 2007. http://www.math.hmc.edu/~dyong/math164/2007/sloss/finalreport.pdf.

Xiaojun Zhai; Bensaali, F.; Sotudeh, R.; , "OCR-based neural network for ANPR," Imaging Systems and Techniques (IST), 2012 IEEE International Conference on , vol., no., pp.393-397, 16-17 July 2012