The purpose of this project
is to
explore the challenges of free-sketch recognition. Generally,
you
will implement an algorithm that take some sort of freely drawn sketch
and attempts to discern some level of structure from the sketch.
However, you have two options to choose from as far as what
you
implement, specifically.
Option 1: Parsing
freely-drawn notes
In this option, your goal is to take a set of freely-drawn
notes
and attempt to classify the strokes in the sketch as either handwriting
or diagram strokes.
Option 2: Recognizing
freely-drawn sketches
In this option, you will take a freely drawn diagram (with no
surrounding notes) and attempt to pull apart the diagram into
individual shapes and then, optionally, recognize those shapes.
With both of the above options, the emphasis is on grouping parts of
the sketch into individual components, not necessarily on recognizing
those components. However, you may choose to implement some
level
of recognition, either for extra credit, or to improve the recognizer.
This assignment is the most open-ended of the four, with the most room
for creativity and exploration. Your grade will be based not
only
on your final results, but also on the thought you demonstrate and the
range of solutions you explore (and how well you discuss the results
and the solution you finally choose). As usual, in class we
will
be talking about a number of different approaches for solving this
problem.
Basic Functionality [60 points]
The basic functionality this week will depend a little on which option
you have chosen above, so I'll specify those choices when relevant.
- [5 points] Allows the user to sketch freely and then invoke
recognition when
the sketch/drawing is complete, or load a sketch from a file in MIT
SketchML and then invoke recognition on the loaded sketch. (I
will provide a library the provides a basic ability to read MIT
SketchML files into WPF ink, but you will probably need to extend those
libraries so that they extract more information, for example from
labeled file, etc.)
- [50 points] In the recognition stage, your application
should:
- Option 1: distinguish between handwriting strokes and
diagram
strokes and make some attempt to group strokes based on spatial or
temporal proximity (e.g., group all the strokes in a "single diagram"
or "single block of text")
- Option 2: group the individual shapes in the diagram into
distinct groups. You do not need to recognize the individual
symbols (but you may choose to, in order to help your grouping).
- [5 points] After recognition, the application should
display the results of
recognition in a meaningful visual form. It's up to you to
come
up with a scheme for displaying the results, but your scheme should be
sufficient so that you can detect errors.
How you choose to do the recognition is totally up to you, but you
should base your approach loosely on something we have read in class.
However, that being said, many of the techniques are quite
complex and would require much more time than you have for this
assignment. Therefore, you should feel free to implement a
simplified version of the technique. For example, one of the papers
might present an algorithm that takes the output of a support vector
machine and feeds it into a hidden Markov model. The core
idea
behind this paper is that temporal information can aid purely structural
information, therefore in your approach you might try something
simpler, like using a support vector machine (or a simpler classifier)
to do the classification, and then simply trying to adjust the results
based on the results from surrounding strokes (either in time or space).
Note that while you do not have to recognize the stroke groups you
create, you may find that it helps you do the grouping more accurately.
For example, you might feed a group of strokes recognized as
"text" to the text recognizer. If it comes back with
nonsense,
you might adjust the group.
The grade for recognition this week will be a function of how
sophisticated your approach to recognition is and how well it works.
If you implement a trivial method (e.g., what we did in
class) you will not get very much credit. You don't need to
go overboard, but you should put some thought into your approach and
try to improve on the trivial, basic approaches (again, basing your
approach on something we read in class is a good start).
Note that you MAY NOT use
the built in Ink Analysis for separating text from drawings for this
assignment. :)
Supplemental Tools and Data
To aid with this assignment, I've provided a number of datasets and
some utilities to help you work with this data:
In all of the above datasets, the files are named according to user.
That is, the first four digits in the file name are the user id.
This will help you if you want to do any user-specific learning
or simply compare results across users.
For Option 2, you can use the labeled XML data files for analog
circuits to start out, but you should also try out your algorithm on
analog circuit data (see below). To do this, you'll have to
extract sketches from the analog circuit notes, save them in their own
journal files, and then convert them. We'll go through this
process in class.
For Option 1, you can use the notes files intact, but you'll have to convert them to SketchML format first.
Advanced Functionality
If you get the basic functionality working, here are several optional
extensions you can try:
- Option 1: Recognize the shapes that you group; Option 2:
Recognize some structure in the text that you find (e.g., lines of
text, bulleted lists, etc)
- Provide a dynamic error correction interface. For
example,
allow the user to correct grouping mistakes through some kind of
interactive interface, then re-recognize the surrounding area based on
this new information.
- Compare and contrast more than one method of doing grouping
- Fully implement some of the more "advanced" techniques for
grouping (talk to me about this beforehand)
Testing and Writeup [40 points]
Once again, a major component of this project will be your writeup.
Submit the same three files as in the previous assignments: a
README file, a Design file and a Testing file. Details on what
should be in each are given below. BE SURE TO INCLUDE ALL THREE FILES AND ANSWER ALL THE QUESTIONS BELOW.
Readme file
In a file called README.txt describe how to run your program, how to use it, which option you chose, any
missing features or bugs that you are aware of, and any extra credit
that you implemented.
Design/approach Justification
In the first part of your writeup, you will analyze your design.
In a file called "Design.txt", write up answers/explanations for
the following (be sure to explicitly address all three points):
- Describe your algorithm.
- How did you develop your algorithm? Give a rationale for how you ended up with your current approach.
- Give an overview and critique of your design from a software
engineering perspective. What did you do better than in previous assignments? What would you change next time?
Testing and Results
In the second part of your writeup, you will report on how well your
algorithm performs. You can focus on qualitative data, but please
BE SPECIFIC when discussing what works and what does not.
Test your algorithms on a variety of real-world data and report
qualitative recognition results by giving examples of a few exemplary
runs as screen shots in your writeup. That is, choose a few
examples that illustrate the functionality of your approach and, in
your report, describe why you chose the example and what it
illustrates. Your goal is to give me a good idea of the strengths
and weaknesses of your approach. You may include one or two
"canned" examples (that you draw yourself) if necessary, but try to
include as much real-world data as possible. You'll probably want
to include 4-6 different runs, but choose as many as you feel you need
to illustrate what's happening in your algorithm.
A note on domain: you should include at least one example from each
domain (digital and analog) even if your algorithm was optimized for
one domain or the other, and comment on how well your algorithms
generalize across domains.
Your testing file this week will probably be considerably longer than
other weeks (especially if you count the screen shots in the page
count).
Criteria
Your writeup will be graded on the following criteria:
- Overall completeness: Did you provide all the information requested above?
- Soundness/completeness of your Justification: Is your design
justification compelling (both for your algorithms and your code)?
Are there solid reasons for your decisions? Even if your
design is not ideal, do you clearly identify what could be improved?
- Depth and Breadth of results: Do you discuss your results at a
non-trivial level in order to illustrate the high-level strengths and
weaknesses of your algorithms? Have you tested your code on a
wide variety of real-world examples?
- Clarity of expression: Is your writeup easy to read and grammatically sound?
What to Submit
Through Sakai, submit the following:
- Your code, including everything I need to run it
- The three writeup files described above