Math OCR - Alistair Dobke, Mark Mann

Project Description

Optical Character Recognition is the process of converting a digital image of text into a corresponding digital representation of that text.

Applications include:

Conversion of written paper to digital documents
Pen based character input on tablet computers
Automated sorting of mail based on zipcode
Check processing in ATMs
Book digitizing

We endeavoured to create a neural network based OCR system that runs on Android phones to solve simple mathematical expressions involving digits 0 - 9, addition and subtraction. The program accepts an input image, normalizes and extracts characters from the image, attempts to identify each character, and tokenize then solve the equation. Using a network with two hidden layers trained with scaled conjugate gradient backpropogation we achieved over 96.5% successful classification rate.

Method

We broke this problem into three parts: image processing, character recognition with a neural network, and equation solving.

Image processing: In order to have good inputs into our neural network - characters from images must first be isolated and extracted from the image and normalized for color, size, and position. With our focus on the character recognition aspect of the project we used a relatively naive form of image processing.

Original Image

Cropped and converted to black and white
We crop and remove color in the same step. All pixels below a predefined grayscale threshold are converted to white, the rest black. Starting from the outside of the image, rows/columns are deleted until the row/column contains a black pixel.

Character extraction
Scanning across columns of an image, when the first column containing a black pixel is found the character start. When the next column containing only white pixels is found, the character ends. Red indicates start of character, green indicates end. There are many false positive here but they will be filtered out because they are too narrow. The big assumption we make here is that characters are not overlapping.

=>

=>

Character normalization
Each character is recropped and centered in a square frame. A 16 by 16 grid is overlaid onto the character. If a box contains more black pixels then a predefined threshold, the corresponding pixel in the final 16 x 16 image is set to black.

Output
Finally the character is transformed into a 16 by 16 boolean array for input into the neural network.
Character recognition: We used a 3 layer feed forward network trained with conjugate scaled gradient backpropagation. Training data was a combination of 163 pixels we gathered by soliciting equations written on a whiteboard and 1593 preprocessed characters from the UCI Machine Learning Repository. Inputs were a 16 by 16 boolean array and outputs were encoded as 1 by 12 array one hot encoding of the digits 0 through 9 and the symbols + and -.

Training was done using the Matlab neural network toolkit. We experimented with different sizes of networks, training functions, input sizes, and transfer functions.

Our best results were had using a 65 x 58 network with 12 outputs neurons. Similar to some of our sources - we found scaled conjugate gradient training much more effective and faster then Levenberg-Marquardt backpropagation training. Similarly we achieved better results using the logsig transfer function as opposed to the tansig transfer function.

Conjugate scaled gradient training

Levenberg-Marquardt 30x30 network training

Results

70% of the training set is used for training, 30% for validation and testing. Training runs for a maximum of 1000 epochs. Final success rate is the percentage of correct classifications on the entire data set averaged over 10 runs.

Training set	Network size	Training method	Success %
Ours	15 x 15	SCG	91.2
Ours	25 x 25	SCG	93.5
Ours	65 x 58	SCG	95.3
Ours	100 x 85	SCG	94.1
Ours	15 x 15	LM	89.63
Ours + UCI	30 x 30	LM	60.9
Ours + UCI	15 x 15	SCG	90.9
Ours + UCI	25 x 25	SCG	93.7
Ours + UCI	65 x 58	SCG	96.6
Ours + UCI	100 x 85	SCG	97.35
Ours + UCI	50 x 45 x 25	SCG	96.05
Ours + UCI	70 x 25 x 25	SCG	95.73

We successfully ported our image processing code along with our neural network onto the Android platform. The application accepts a picture and attempts to solve the given equation. The app is low on features but serves as a proof of concept.

==>

Code

Download Android application

Sources

Matan, O.; Baird, H.S.; Bromley, J.; Burges, C.J.C.; Denker, J.S.; Jackel, L.D.; Le Cun, Y.; Pednault, E.P.D.; Satterfield, W.D.; Stenard, C.E.; Thompson, T.J.; , "Reading handwritten digits: a ZIP code recognition system," Computer , vol.25, no.7, pp.59-63, July 1992

Shoeb Shatil, Adnan Md. Research Report on Bangla Optical Character Recognition Using Kohonen Network. Tech. Center for Research on Bangla Language Processing, n.d. Web. http://www.panl10n.net/english/final%20reports/pdf%20files/Bangladesh/BAN26.pdf.

Sloss, Steven. "Handwritten Equation Intelligent Character Recognition with Neural Networks." Harvey Mudd College, 2007. http://www.math.hmc.edu/~dyong/math164/2007/sloss/finalreport.pdf.

Xiaojun Zhai; Bensaali, F.; Sotudeh, R.; , "OCR-based neural network for ANPR," Imaging Systems and Techniques (IST), 2012 IEEE International Conference on , vol., no., pp.393-397, 16-17 July 2012

OCR for Mathematical Expressions

Alistair Dobke and Mark Mann
CS152 (Neural Networks) Fall 2012

Project Description

Method

Original Image

Cropped and converted to black and white

Character extraction

Character normalization

Output

Results

Code

Sources

OCR for Mathematical Expressions

Alistair Dobke and Mark Mann CS152 (Neural Networks) Fall 2012

Project Description

Method

Original Image

Cropped and converted to black and white

Character extraction

Character normalization

Output

Results

Code

Sources

Alistair Dobke and Mark Mann
CS152 (Neural Networks) Fall 2012