Project Goals

Our project's goal is an autonomous robot which moves around an area, searches for sheets of paper with textual information on them ("documents"), and takes pictures which will contain the document. A secondary goal will be to cause it to take the pictures such that they will contain the entirety of the paper, and for the picture to be readily segmented by Laserfiche's PhotoDocs software.

Our project was divided into three pieces: getting sensory data from two webcams, processing that data to move the robot appropriately, and taking high resolution pictures of located documents. These three processes were entirely distinct (in fact, they were all implemented in different programming languages), and communicated with each other as little as possible.

Sensory System

Due to limitations both in time and the resolution of our cameras - which were unable to detect the text in a document - we decided to go with an approach that would identify white rectanguloids. Because we had a large amount of existing code to leverage, image processing requires some degree of speed, and it was the language with which both of us were most accustomed to using OpenCV in, we decided to use C++.

Because we were detecting white objects, our approach was to mark every pixel in the image with a "whiteness" value - pixels with low saturation and high value had higher scores than those with high saturation or low value. To remove as much noise as possible, we used the formula value6 * (1 - saturation)5 as our whiteness value, removing virtually everything that wasn't pure white. We then ran Canny edge detection, contour-finding, and polygonal simplification on the resulting image, as implemented in OpenCV. We removed all non-quadrilaterals, all shapes below a size threshold, and all contours with points within a few pixels of the edge. Of the remainder, we found the center of the largest quad and sent that to the motion system.

We used two cameras, one used for centering the document to take a picture, the second to locate further-away documents the first camera couldn't see.

Motion control system

For the system that controlled the robot's motion, we used Python. We were given an interface to the robot in Python, and had both worked with it previously, so there was no real advantage to using another language.

Because the primary function of the motion system is homing in on located papers, it first checks if the lower camera found something; if so, it turns to center it horizontally, then moves forward or backward to center it vertically. Once the paper is centered, it signals the photography system to take a picture, stays still for a bit, and then turns a specified angle to seek out another paper.

If there was nothing detected by the lower camera, it then checks the upper camera. Again, if something was detected, it turns until the paper is horizontally centered, then drives toward it until the lower camera registers or it loses the picture.

If neither camera finds anything resembling a document, the robot wanders semi-randomly until it finds one. This randomized motion is biased in two directions; first, it is biased toward the last direction it saw a document, in case it loses sight of one while in the centering phase. Second, it is biased in the last direction it wandered, to avoid some of the problems with fully-randomized robot motion (what Stephen dubs cyber-epilepsy").

Photography System

When the motion system has centered on a document, it triggers the photography system to take the high-resolution picture. This system consists of a digital camera attached to the robot, a servo motor which presses the button on the camera (due to the lack of an electronic interface with the camera), and an Arduino I/O controller. The controller has a proprietary language which is nearly identical to C. As this was the hardware that was available, we used that language for programming this system.

Although there was a learning curve to working with the servo motor, we managed to get it working such that the motion system only need send a signal over the Arduino's USB interface to trigger the digital camera. However, due to jostling of the components and the dependence on flexible duct tape to provide an inflexible structure limited its ability to function in conjunction with the rest of the robot. We are convinced that, given some amount of rigid material such as sheet steel, we could modify the system to work as desired.


Were we to undertake this project again, we would most likely work with the Evolution ER-1 from the beginning, as much of our trouble came from the robot of our choice, the Nomad, failing to work within a reasonable amount of time. We did learn from this, however, that it is possible to effectively use even low-resolution sensory systems to recognize objects, and that while it is quite possibly the most useful non-water substance in the world, duct tape is not the ideal material for rigid-structure construction.