CS153 Fall 2000 Assignment 3

Harvey Mudd College
Computer Science 153
Assignment 3
Due Friday, October 13, by midnight

Back to Assignment 3, top-level page

Back to Assignment 3, Section 1: Statistical Object Recognition

Section 2: OCR and Structural Pattern Recognition

Distinguishing hand-written characters with statistical means is complicated by the enormous discrepancy with which those characters can be written. Almost any character can be "rigged" to have a given response to some statistical shape measure -- Scott Kim, for example, demonstrates the incredible flexibility of our recognition process. This section asks you to write systems to use both statistics and structural information to identify characters, though not necessarily letters of the alphabet.

Statistical pattern matching: Set!

In class we looked at examples of clusters of democratic and republican presidents along axes of height and popular vote; we also saw groups of binarized digits along axes of dominant orientation and circularity. This problem asks you to use similar statistical techniques to create a system that plays the visual-perception game of Set.

Set is a card game in which each card has a different image. The images consist of one, two, or three "tokens" or "symbols"; the tokens are one of three colors (red, green, or purple); they are one of three shapes (diamond, oval, or squiggle), and they are shaded in one of three ways. Several of these cards are laid out on a table: the goal is to determine, as quickly as possible, a "set" of three cards for which every token characteristic is completely shared or pairwise distinct. (An online set of examples is at this site.

This problem seeks to build one piece of an automated set player -- namely, the piece that will determine which of three shapes is currently under consideration. (One possible extension is to complete the automated set player by building recognizers for the color, number, and shading characteristics, too.)

The problem Write a matlab script or function that takes in a binary image of a token (a single symbol, either diamond, oval, or squiggle). It should then output the name of the input symbol (or a number representing the particular symbol that was seen). You can assume that only those three symbols will be input, but you should not assume anything about their size, location, or orientation with respect to the surrounding image. A database of images of the Set symbols is available in

/cs/cs153/Images/a3/Set

or can be downloaded as a tar file from this link.

in which there are 21 examples of diamonds, ovals, and squiggles of various sizes and in various orientations. The files are named L#.jpg, where L is one of the letters 'o', 'd', or 's' (for oval, diamond, or squiggle) and # is a number from 1 to 21. Thes images are grayscale, but they are not yet binary... you will need to threshold them in order to get binary images to interpret. This is an important step to preserve, because thresholding, in general, introduces noise into the symbols' shapes, and the system should be insensitive to that noise.

Suggested approach This problem is asking you to take a statistical approach to shape recognition. As a result, there are two main parts to consider. First, you may want to try several different statistical measures of the three shapes in a variety of poses and sizes in order to separate them as cleanly as possible. In class, we mentioned several such measures; you should also feel free to investigate ones of your own design. Second, analogous to the color-histogram-based recognition section prior to this one, you will need to determine how your system will decide which class to categorize new data into. Again, a number of possibilities are available: nearest mean, nearest neighbor(s), and maximum likelihood estimation (assuming Normally distributed clusters). Depending on how ambitious you feel, you might instead want to consider using neural networks (Matlab has a full set of built-in resources for supporting them). That and other possibilities are considered in the extensions.

In your results, be sure to include links to your code, a confusion matrix from the examples you tested (your confusion matrix should be 3x3), and a short rationale behind your system's design.

Results:

Structural Object Recognition and Decision Trees

One simplifying characteristic in the previous problem is that the symbols to be identified were all machine-printed. Handwritten symbols add an additional layer of complexity to identifying distinct classes of symbols; often a qualitative, or structural, approach works better in such cases.

Distinguishing letters For this problem, choose a subset of at least 10 letters of the alphabet that your system will distinguish. (If you would like to tackle all 26, that certainly would be considered a sufficient extension to this assignment.) If you do choose to use a subset of the alphabet, you might want to consider what structural properties you plan to use to distinguish the letters as you choose the letters themselves.

Structural Features Next, choose at least three structural features with which to distinguish the characters you decided on above. You may want to use more than three to more effectively deciode between the characters in your set. In class, we discussed a number of different discrete attributes with which you might distinguish handwritten letters. Those included

Any of the continuous, statistical shape parameters, discretized into a small number of categories. For example, "high" vs. "low" aspect ratio.
Background regions within the convex hull of the characeters, i.e. "convex deficiencies." In particular, you might use the following:
the types of background regions (lake vs. bay)
the number of background regions
the general size of background regions
the locations or relative locations of those regions
characteristics of their shape (convexity)

In addition, keep in mind that each character need not have only a single description in terms of these structural features.

Building the decision tree Write matlab functions (or scripts) that input a binary images of a single character and output that character's value for each of the structural features you've chosen. For at least 5 of each letter you're working with, compute their structural features and include the results in a table (a link to an ascii file is fine) in the results section below. For example, a couple of lines from such a table might look like the following:

Letter     Types/Locations of lakes/bays      Size
------     -----------------------------      ----
  Q        0 (center)                          L
  R        0 (top), 4 (right), 1 (bottom)     M,S,S

Here, the encodings ( 0 = lake, 1 = south-facing bay, 2 = west-facing bay, 3 = north-facing bay, 4 = east-facing bay, L = large, M = medium, S = small ) are being used. Feel free to put together your own representation.

With the data from this table, compute the best first attribute on which to build a decision tree. That is, which attribute provides the most information about which character is present? How much better is that attribute than the others you considered?

Which attribute provides the most information, as a percent of the maximum possible information that attribute could provide (given the number of different values it can take)? For example, a two-value attribute can provide at most 1 bit of information, while an eight-value atribute could provide up to 3 bits of information. If each provides 0.5 bits, the former, binary attribute is 50% as effective as it might be, while the eight-ary (octal?) attribute is only 6.25% as effective as it could be.

Finally, create a decision tree (it need not be completely optimal, if that decreases its flexibility) for recognizing the letters in your set. Try it on several additional letters of each type. What recognition rates does your system produce? Be sure to include the table of data, the information-theoretic "best" attributes, the decision tree, and these recognition results in the section below.