PBC Assignment 4

Due Friday, April 4

The purpose of this project is to explore the challenges of free-sketch recognition. Generally, you will implement an algorithm that take some sort of freely drawn sketch and attempts to discern some level of structure from the sketch. However, you have two options to choose from as far as what you implement, specifically.

Option 1: Parsing freely-drawn notes
In this option, your goal is to take a set of freely-drawn notes and attempt to classify the strokes in the sketch as either handwriting or diagram strokes.

Option 2: Recognizing freely-drawn sketches
In this option, you will take a freely drawn diagram (with no surrounding notes) and attempt to pull apart the diagram into individual shapes and then, optionally, recognize those shapes.

With both of the above options, the emphasis is on grouping parts of the sketch into individual components, not necessarily on recognizing those components. However, you may choose to implement some level of recognition, either for extra credit, or to improve the recognizer.

This assignment is the most open-ended of the four, with the most room for creativity and exploration. Your grade will be based not only on your final results, but also on the thought you demonstrate and the range of solutions you explore (and how well you discuss the results and the solution you finally choose). As usual, in class we will be talking about a number of different approaches for solving this problem.

Basic Functionality [60 points]

The basic functionality this week will depend a little on which option you have chosen above, so I'll specify those choices when relevant.

[5 points] Allows the user to sketch freely and then invoke recognition when the sketch/drawing is complete, or load a sketch from a file in MIT SketchML and then invoke recognition on the loaded sketch. (I will provide a library the provides a basic ability to read MIT SketchML files into WPF ink, but you will probably need to extend those libraries so that they extract more information, for example from labeled file, etc.)
[50 points] In the recognition stage, your application should:

Option 1: distinguish between handwriting strokes and diagram strokes and make some attempt to group strokes based on spatial or temporal proximity (e.g., group all the strokes in a "single diagram" or "single block of text")
Option 2: group the individual shapes in the diagram into distinct groups. You do not need to recognize the individual symbols (but you may choose to, in order to help your grouping).

[5 points] After recognition, the application should display the results of recognition in a meaningful visual form. It's up to you to come up with a scheme for displaying the results, but your scheme should be sufficient so that you can detect errors.

How you choose to do the recognition is totally up to you, but you should base your approach loosely on something we have read in class. However, that being said, many of the techniques are quite complex and would require much more time than you have for this assignment. Therefore, you should feel free to implement a simplified version of the technique. For example, one of the papers might present an algorithm that takes the output of a support vector machine and feeds it into a hidden Markov model. The core idea behind this paper is that temporal information can aid purely structural information, therefore in your approach you might try something simpler, like using a support vector machine (or a simpler classifier) to do the classification, and then simply trying to adjust the results based on the results from surrounding strokes (either in time or space).

Note that while you do not have to recognize the stroke groups you create, you may find that it helps you do the grouping more accurately. For example, you might feed a group of strokes recognized as "text" to the text recognizer. If it comes back with nonsense, you might adjust the group.

The grade for recognition this week will be a function of how sophisticated your approach to recognition is and how well it works. If you implement a trivial method (e.g., what we did in class) you will not get very much credit. You don't need to go overboard, but you should put some thought into your approach and try to improve on the trivial, basic approaches (again, basing your approach on something we read in class is a good start).

Note that you MAY NOT use the built in Ink Analysis for separating text from drawings for this assignment. :)

Supplemental Tools and Data

To aid with this assignment, I've provided a number of datasets and some utilities to help you work with this data:

Download labeler and converter
There are two programs included in this zip file. JntToXML converts Microsoft Journal files to the MIT SketchML format we looked at in class. You can find documentation on the format here.
The labeler program allows the user to label sketches that are in the MIT SketchML format. It is pen-based and designed to run on the Tablet PC.
A very simple program to read SketchML into WPF is here. Note that this program will ignore any shape or label information in the SketchML file, so you'll need to modify it if you want to read that information in. Note that it's just a library--you can't actually run it without including it in a solution.
Labeled Digital Logic Diagrams in MIT SketchML format
Unlabeled Journal Files (Digital Logic Notes--text and diagrams)
Unlabeled Journal Files (Analog Circuit Notes--text and diagrams)

In all of the above datasets, the files are named according to user. That is, the first four digits in the file name are the user id. This will help you if you want to do any user-specific learning or simply compare results across users.

For Option 2, you can use the labeled XML data files for analog circuits to start out, but you should also try out your algorithm on analog circuit data (see below). To do this, you'll have to extract sketches from the analog circuit notes, save them in their own journal files, and then convert them. We'll go through this process in class.

For Option 1, you can use the notes files intact, but you'll have to convert them to SketchML format first.

Advanced Functionality

If you get the basic functionality working, here are several optional extensions you can try:

Option 1: Recognize the shapes that you group; Option 2: Recognize some structure in the text that you find (e.g., lines of text, bulleted lists, etc)
Provide a dynamic error correction interface. For example, allow the user to correct grouping mistakes through some kind of interactive interface, then re-recognize the surrounding area based on this new information.
Compare and contrast more than one method of doing grouping
Fully implement some of the more "advanced" techniques for grouping (talk to me about this beforehand)

Testing and Writeup [40 points]

Once again, a major component of this project will be your writeup. Submit the same three files as in the previous assignments: a README file, a Design file and a Testing file. Details on what should be in each are given below. BE SURE TO INCLUDE ALL THREE FILES AND ANSWER ALL THE QUESTIONS BELOW.

Readme file

In a file called README.txt describe how to run your program, how to use it, which option you chose, any missing features or bugs that you are aware of, and any extra credit that you implemented.

Design/approach Justification

In the first part of your writeup, you will analyze your design. In a file called "Design.txt", write up answers/explanations for the following (be sure to explicitly address all three points):

Describe your algorithm.
How did you develop your algorithm? Give a rationale for how you ended up with your current approach.
Give an overview and critique of your design from a software engineering perspective. What did you do better than in previous assignments? What would you change next time?

Testing and Results

In the second part of your writeup, you will report on how well your algorithm performs. You can focus on qualitative data, but please BE SPECIFIC when discussing what works and what does not.

Test your algorithms on a variety of real-world data and report qualitative recognition results by giving examples of a few exemplary runs as screen shots in your writeup. That is, choose a few examples that illustrate the functionality of your approach and, in your report, describe why you chose the example and what it illustrates. Your goal is to give me a good idea of the strengths and weaknesses of your approach. You may include one or two "canned" examples (that you draw yourself) if necessary, but try to include as much real-world data as possible. You'll probably want to include 4-6 different runs, but choose as many as you feel you need to illustrate what's happening in your algorithm.

A note on domain: you should include at least one example from each domain (digital and analog) even if your algorithm was optimized for one domain or the other, and comment on how well your algorithms generalize across domains.

Your testing file this week will probably be considerably longer than other weeks (especially if you count the screen shots in the page count).

Criteria

Your writeup will be graded on the following criteria:

Overall completeness: Did you provide all the information requested above?
Soundness/completeness of your Justification: Is your design justification compelling (both for your algorithms and your code)? Are there solid reasons for your decisions? Even if your design is not ideal, do you clearly identify what could be improved?
Depth and Breadth of results: Do you discuss your results at a non-trivial level in order to illustrate the high-level strengths and weaknesses of your algorithms? Have you tested your code on a wide variety of real-world examples?
Clarity of expression: Is your writeup easy to read and grammatically sound?

What to Submit

Through Sakai, submit the following:

Your code, including everything I need to run it
The three writeup files described above

Pen Based Computing Assignment 4