Harvey Mudd College
Computer Science 153
Assignment 3
Demo due Friday, October 24, by 4:30pm
Write-up due Sunday, October 26th, by 11:59pm
Automatic image mosaicking
Thanks to A. Efros for the inspiration (and details!) of this assignment
Goals
For this assignment, you will create a system that will automatically
extract and match features between two images. Then, you'll draw on your work
from assignment 3 to create a mosaic from those images. As an additional feature,
you might extend your system to handle more than two images automatically. Other
bells and whistles are suggested below.
The end-goal of this HW #4 is to create an interface that allows the
user to specify two images. The system then should
- extract Harris corners (see the Matlab code provided below)
- determine a subset of the Harris corners to use
- compute a feature descriptor for each of those corners
- match those features bewteen two images
- use a method robust to outliers (RANSAC) to compute the best resulting homography
- create the resulting mosaic
The paper
This project is an implementation of part of the paper Multi-Image Matching using Multi-Scale Oriented Patches by Brown et al. (2005). However, we will make several simplifications. Read the description below and then look over the paper, making sure you understand the sections this project asks you to implement! We will also discuss some of these techniques in more detail in class.
Tasks
-
extracting corners Start with the Harris Interest Point (corner) Detector
(Section 2). We won'tt worry about muti-scale - rather, we will use only the highest-resolution scale
on the initial image Also, don't worry about sub-pixel accuracy. Re-implementing Harris is a thankless task - so you can use Alyosha Efros's sample code (for those using matlab):
harris.m. OpenCV has an API call cvCornerHarris that populates an output image with each pixel's Harris corner strength. Use a neighborhood size of 7 and the default block size of 3. You will need to keep only local maxima, pixels where the corner strength is greater than at any of the eight neighbors. Also, omit corners found in the outermost 22 columns and rows of the image.
-
culling the features Implement Adaptive Non-Maximal Suppression (Section 3). Keep
the 500 features with the greatest radii of support.
-
describing each feature Compute a 64-value descriptor for each feature that remains
after the adaptive non-maximal suppression. Don't worry about rotation-invariance - just extract axis-aligned
8x8 patches. Note that it's extremely important to sample these patches from a surrounding 40x40 window to have a nice
big blurred descriptor. I used the pixel
values with horizontal coordinates [ x-21, x-15, x-9, x-3, x+3, x+9, x+15, and x+21 ] along with the same spacing vertically to
gather the 64 values. Your system should handle changes in intensity (sometimes called "bias/gain-normalization") by
subtracting the mean and dividing by the standard devision of the 64 values. This will result in 64-vectors whose mean
is 0.0 and whose variance is 1.0. (The matlab function for finding the standard deviation of a
vector v is std(v,1).) Also, just use pixel values; we won't implement the wavelet-indexing approach.
-
matching features
Implement Feature Matching (Section 5). That is, you will need to find pairs of features that look
similar and are thus likely to be good matches. If you're using matlab, you may find dist2.m useful for fast distance
computations. For thresholding, use the simpler approach due to Lowe of thresholding on the ratio between the first
and the second nearest neighbors -- consult Figure 6b in the paper for picking the threshold. Section 6 does not
need to be part of your implementation. Finally, you should implement the 4-point RANSAC as described in class to compute a robust homography estimate between the two input images.
Note on homographies: If RANSAC chooses four points in which three (or all four) are very close to being collinear, the
homography created from their correspondences will be (almost) singular - and the results will become numerically too
large/infinite. There are many ways to handle this, but one reasonable way is to check the area of the four triangles that can
be formed from the four points that RANSAC chooses. If the smallest of those four areas is less than a threshold, simply throw
out that set of four points. To be as robust as possible, your code should check these areas in both of the two images being
matched, though it almost always suffices to check only one.
Note to OpenCV users: OpenCV has some idiosyncrasies in how it computes homographies and uses homographies to transform
individual points, as you will want to do in this case. (It's less idiosyncratic in how it uses them to warp images.) For
computing homographies from four or more pairs of corresponding points, use cvFindHomography. For applying the
resulting homography to individual points, you can use cvPerspectiveTransform. The code at this link demonstrates how to create, assign, and use the data structures needed for
these API calls.
Note to Matlab users: Matlab's maketform function only allows 4 corresponding points in creating a homography. You may use
the script at this link in order to
create the best homography from more than four points (best in the
least-squared-error sense). You will also need
this helper function in the same directory . An example of how to use
these two matlab functions appears at
this link.
-
Creating the mosaic
Finally, use your work from the previous assignment in order to output a mosaicked image that includes
the pixels of both of the input images.
Write-up
In your write-up, be sure to include pictures of the intermediate results for
two overlapping images of your choice (perhaps two from hw #3), as well as the
final mosaic created. The intermediate results include
- the Harris corners
- the 500 corners preserved after adaptive non-maximal suppression
- the top 50 matches in each of the two images (just show the corners, it's OK
to leave it to the imagination which corners matched which)
- the top 5 matches in each of the two images (please use different colors for
corresponding points so that the matches are clear!)
- four of the inlier matches after RANSAC has run
- the resulting mosaic
In addition, you should include a brief description of how you approached the pieces
of the algorithm and where any difficulties arose.
You should include an archive file of your code, as well as
the results from at least two successful runs and one unsuccessful run,
along with an explanation of why the unsuccessful run did not work.
Extra features
For full credit, you should include at least one extra feature for this week's assignment - it can be of
your own choosing or a variant on one of the ideas below. If you do, be sure to include an
example in your write-up and an explanation of the results!
- Enable your system to handle more than two images at once. Here, it will have to
decide which images match with which and will need to choose one coordinate system
to warp the other images into.
- Create a system that determines the "odd image out." That is, when provided with four
images -- three of the same scene and one of a different scene, your system should
be able to cluster them into the appropriate groups of 3 and 1. Credit for this
idea to Sesame Street's "Which one of these is not like the others?"
-
Add multiscale processing for corner detection and feature description
-
Add rotation invariance to the descriptors
- Implement panorama-stitching or panorama-recognition, in which the images wrap around to form a loop.
Demo and write-up
You should demo your system by Friday, October 24th by 4:30 pm.
Be sure to complete your write-up and link it into
the CS
153 wiki by 11:59pm on Sunday, 10/24.