A Robotic Printer

towards an implementation of something awesome...





We plan to make a robotic printer. It will print on the ground using either tradition paints (spray cans/pens/markers) or by dropping items on the ground (such as varyingly tarnished pennies).

We tentatively plan to use a roomba as the base platform for our robot.

Week to Week Progress

Final Weeks

We modified our inverse kinematics from our previous update. Previously, we tried to find straight lines between the current pen position and the destination. However, in order to do this accurately, the velocity of the robot's wheels must be continuously varied. We found that trying to change the wheel speed too frequently resulted in undesirably jagged motion. To fix this problem, we decided instead to find an arc between our current position and destination that we can travel along with constant wheel speeds. We find this arc based on the fact that the robot, and thus the pen, can rotate around any point along the line defined by the two wheels. Therefor, we can always find a circle centered on this line that contains both the current pen position and the destination. To find the center of this circle, we find the intersection of the line defined by the current position of the robot's wheels and the perpendicular bisector of line segment joining the current pen position and the destination. We can then calculate the constant wheel speeds that allow us to move around this circle.

Because it doesn't require constantly changing the robot's wheel velocities, this approach allows for much smoother motion and less jagged pictures. To facilitate this, we also significantly reduced the number of points in our images. To move from point to point, we set an update delay of about 1 second. We then calculate approximately how far towards the next point we can move in the next second and choose a destination on the line between our current position and the next point. We then move to this destination using the approach described above. Our robot uses a speed parameter to determine how fast to go, but if going at its normal speed would cause it to overshoot the next point in the image, it will reduce its speed to reach the point exactly instead. After the delay time, the robot's position is updated from its odometry and a new destination is selected, until all of the points in the image have been reached. Because the odometry is not accurate for long, the robot will periodically re-localize using the vision component.

After the modifications to the vision system in the last update (optimized color identification), we decided that it would not be possible to get sufficiently accurate positions in real time. We no longer poll the vision system as fast as the robot updates, instead the robot only occasionally attempts to localize itself through vision in order to correct drift. Also, the robot uses several frames from the camera and averages the resulting positions together rather than just using a single image.

Because we no longer have to worry about real-time performance, a much more accurate computation of the position of the robot can be done. First, the image is now corrected for lens distortion, an operation which was previously too expensive to be done in real time on our laptop. Additionally, we blur the image to remove fine noise, and we subtract out a captured background so that we only consider areas that have changed since the background capture. If the robot is not included in the background capture, then as long as it is visible it will be in one of the changed regions.

Finally, we use the previous optimized selection mechanism but with one change: rather than taking the center of mass of the region matching the desired color characteristics, we calculate contours for the thresholded image and take the center of mass of the largest contour. This method is much more robust against noise, since small random noise far away from the robot's position will not affect the largest contour but will affect a global average.

Here are a couple of pictures and a video of our system in action:

We also fixed a few bugs during our final weeks. Most notably, we added a delay for standing still before the robot gave us back odometry. This dealt with a lag we had which was causing us to draw in waves and sawtooth patterns.

Overall, I would say that our robot was a great success.

Here is our code in its final version: Final Code

Week 7

Code for this week

The move to offboard camera:

Localization now uses an offboard camera. Compared to localization with posts, the task is greatly simplified. In particular, if it is assumed that the robot can only move along a plane, which is true of the Roomba, then provided that the plane is not parallel to the view vector of the camera, there is a one to one mapping between points in the camera's image and points on the plane. Hence, computing the robot's position is only a matter of detecting its position within the image and then transforming those coordinates into the coordinate system of the plane.

The actual identification of the robot is fairly straightforward based on colored markers placed on the robot itself, however optimizing the routines to reach an acceptable level of performance was not trivial. The first technique we used was to correct the image for distortion, blur it, convert it to HSV, select colored regions in it twice based on calibrated ranges, and then find the contours of the selected regions to find the markers.\ Needless to say, this was very slow, and worse, slowed down sporadically, sometimes freezing for up to a few seconds before the images updated again.

Stripping out everything, including HSV, and selecting only based on RGB yielded almost acceptable results. The video we have of directing the robot is based on this system (our camera stopped working before we could get a video of the improved system, yet to be described).

However, it was clear that just the selection task had to be optimized, and ideally the HSV conversion would be brough back into the process. The solution was to precompute a large three-dimensional table of quantized RGB values and whether they are within the selection range. The table is of the form table[r][g][b], and each entry is zero or one, specifying whether those rgb coordinates, when converted to hsv, would be within the selection range. Both the HSV conversion and selection tasks are signficantly sped up by combining them together into a table.

The actual implementation stores two entries per rgb value, corresponding to the two selection ranges for the blue and green dots, and both selections are done simultaneously to further optimize the process. The table is 30x30x30x2, or 54,000 entries; compare this to the 320x240x3 = 230,400 entries in the representation of one of the images, and the size is not so large. The table is computed very quickly; it takes far less than one second. With this method, HSV conversion and selection is significantly sped up so that the system's bottleneck becomes acquiring the image from the camera itself rather than the processing.

The second part of the vision system, transforming from image coordinates to plane coordinates, is significantly simplified by assuming that the plane can only tilt towards or away from the camera; there can be no component of the tilt that varies horizontally. Based on a tilt slider, the normal and two tangent vectors specifying the plane's coordinate system are calculated. Conversion from image to plane coordinates is then simply a matter of intersecting the line traced from the desired location in the image with the plane and projecting the 3d location on the plane thus obtained onto the tangent vectors.

In principle, the plane can also vary in distance from the camera and the camera can vary in terms of its focal length, but due to the nature of the perspective transform, the plane distance and the focal length both determine the same parameter, which is only a scaling factor.

In practice, there are two settings that must be manually calibrated: the plane tilt, and the scale (distance). The user tilts a grid superimposed on the image until it aligns with convenient parallel lines in the image, and then the user sets the scale until the grid spacing is equal to some reference mark in the environment (such as a 10cm strip of paper, or 1ft long tiles).

Inverse Kinematics

To control our robot's motion, we create an "image" consisting of a sequence of points that we want the pen to travel between, as well as an indication at each point whether the pen should be up or down on the way to that point. By visiting these points in succession, our robot creates a "connect the dots" picture.

Since our objective is to move the pen along a specific path, we must solve for the inverse kinematics of the pen-robot system. To do this, we decompose the desired pen motion into a "forward" and a "horizontal" component with respect to the robot's current orientation. Moving the pen forward is simple because a forward motion of the robot corresponds to an identical forward motion for the pen. A horizontal pen movement is more difficult however, as the robot is not capable of moving sideways. However, for small angles, a rotation of the robot is equivalent to a horizontal motion of the pen by the angle rotated times the distance of the pen from the robot. Based on this decomposition, we can determine the desired ratio of forward motion to rotation, and move the robot accordingly. After a short time, we update the position of the robot and recalculate the rates of forward and turning motion. Because our images consist of a large number of points with very little separation successive points, the robot never needs to turn far on its way to the next location and the small angle approximation is valid. To determine it's position, the robot uses frequent updates from the hardware odometery, with less frequent absolute localization from the vision component. The inverse kinematics works in simulation (shown above), and has been integrated with the image processing in principle, but has not yet been fully tested with the hardware. The next step is to finish putting all the pieces together and use the inverse kinematics together with the vision to actually draw pictures

The Future

In the future, we want to get rid of the non-wireless cord. Also, we must do thurough testing and parameters setting. We also want to make it play songs and add multiple pens.

It would also be fun to guide the robot on the fly or otherwise introduce-human interaction.

Week 6

Here is a Video of our robot assembly:

Lens Correction

The image from the webcam has pronounced barrel distortion that will interefere with correctly reading the angular positions of the posts if not corrected, so the first step is undo the distortion to produce an undistorted image. The distortion is assumed to be purely radial; that is, a pixel at a location (r, theta) [in polar] in the undistorted image is moved to location (f(r), theta) in the distorted image. The image is undistorted by just distorting each pixel by the inverse of f(r), which is represented as a third-order polynomial. The polynomial coefficients were found by hand through trial and error.

HSV Conversion & Selection

The undistorted image is in the RGB color space, which is useful for displaying on monitors but difficult for segmenting objects based on color. The next step is to convert the image into the HSV space, which is done in the normal fashion. The output image is still drawn by the system as if it were RGB, except that the R channel now contains the H channel data, G contains S, and B contains V.

The next step is to produce two images from the HSV converted, undistorted image: a 2-bit (thresholded) image where white corresponds to the color of the blue strips on the posts, and a 2-bit image where white corresponds to light areas and black corresponds to non-white areas (either because they are too dark or they are a distinct hue).

Both of these images can be produced by running all the pixels of the image through some sort of selection mechanism that can identify pixels in the desired set (e.g., blue pixels). The simplest selection mechanism is simply an axis-aligned bounding box in HSV space: that is, all pixels that fall within a certain H range, a certain S range, and a certain V range are selected. The bounding box can then be represented with six values specifying the ranges for each of the channels; in our implementation, we also add the ability to set the minimum value of a range above the maximum value, in which case the selector selects everything outside the range (values above min OR below max [when min > max]). This is helpful for some cases, such as selecting the hue red, which, due to the HSV mapping, is both the smallest values of H and the largest values of H.

For the blue selection, the hue range is a small segment slightly to the right of the middle of the total H range, the saturation select range is everything except the very bottom of the range (the strips have to have at least some saturation), and the value range is everything except the bottom (zero values produce unpredictable hues).

For the white selection, hue is everything, saturation is only the low end, and value is only the high end. This selects pixels with low saturation (little difference between color channels in RGB) and high value (high largest color channel value in RGB), which tends to select the whitest, brightest pixels as desired.

The actual ranges used must be calibrated to the lighting conditions of each test area, but the screenshot to the right shows a set of calibrated ranges as well as the resulting images from selecting with those ranges.

Contour Conversion

The next step is to identify components in the image. Usually this is done with connected components, but we actually identify components by their contours: that is, the algorithm (not invented by us) follows edges in a thresholded image until they form closed loops and then the algorithm returns those loops. It should be noted that this will catch all of the connected components we are concerned with: all of the components except the background component must have a boundary with the background, and the contour-selection algorithm will return that boundary. Since we expect that our posts' stripes will stand out from the background, this method will be sufficient.

The advantage of contour selection of connected components is that, we've found, it is actually much faster in practice when compared to even an efficient (union-find based) connected components implementation. Additionally, contours can be easily used to compute the curvature of the contour, which can itself be used to help identification. With connected components alone, curvature computation would require another algorithm to do the contour selection. The disadvantage is that contour selection will select black components (which connected components does as well), but due to the nature of contours, it is not as straightforward to prune black components. The correct way to do it (which we do not, since black components [that is, black regions within white regions] occur very infrequently) is to compute the average curvature of the contour: the contour selection algorithm winds black and white regions in opposite directions, and the average curvature's sign corresponds to this winding.

Stack Identification

Only the blue image (the image which has the color blue selected) is actually converted to contours; after contour converstion, a bounding box is computed for each of the contours in the standard way (a contour is simply a collection of (x,y) pairs defining the boundary of the component).

Now, we know that all of our posts have three blue bands stacked vertically on top of each other, so now we search our collection of bounding boxes for 'stacks' of boxes.

We do so by first sorting the bounding boxes from top to bottom based on their top y-coordinates. Next, starting with the highest box, we look at all the boxes lower than it, in order, to see if we can find a match. A match occurs when a box is found of approximately (falling within a certain range of ratios) the same width as the target box and which is positioned approximately (within a certain range expressed as a proportion of the width) below the target box. When the first match is found, the target box has its match set to the found box, and the target box becomes the next box in the list. In this fashion, each box is matched to the closest box below it which can sustain a match. Also, note that noise will not affect the correct matching of two boxes even if that noise is between the two boxes, unless that noise is sufficiently similar to a correct match to be incorrectly matched itself.

In the screenshot in the section on reading the post's id, matching boxes are connected with white lines, and the lowest box of each stack is colored in green rather than red.

Post Selection

Once stacks have been identified, there are two properties that can be used to cull away stacks that do not correspond to the stripes of a post: first, posts have three stripes, so all stacks that do not have three components can be culled. Note, however, that if a post accidentally captures noise below or above it, then a post will be mistakenly culled. Second, the posts have a known height and diameter, so their images will share a known aspect ratio. The aspect ratio of a stack is taken to be the ratio of width/height, where the width and height are given by the bounding box that would contain all three boxes of the stack. The danger with using the aspect ratio constraint is that if the camera is tilted, then the posts will not be completely vertical, and the aspect ratio as measured by the large bounding box will be larger than expected. Of course, tilted posts will also interfere in the stack identification step, so camera tilting is a major concern at several steps.

While additional constraints can be imposed, such as using the fact that the stripes occur at known positions relative to one another, selecting posts is a compromise between avoiding false positives and false negatives. Additional constraints reduce false positives by making it less likely that noise or background objects will satisfy all the constraints, but additional constraints also make it more likely that imperfections in the post, camera alignment, HSV selection, and all the other steps in image processing will cause a constraint test to be failed and falsely reject an actual post. The constraints we have chosen are already enough so that false positives are extremely rare; however, false negatives are relatively common, and we do not want to make finding posts any more difficult than necessary.

Reading the Post ID, Distance, Angular Position, and Gray Code

While posts are identified with the blue-selected image, the white-selected image is actually used to calculate the post ID, since the ID region of the post is green / white to avoid having the ID interefere with the post stripe identification. Additionally, selection by value/saturation is more robust than selection by hue, which means that the white-selected image is more accurate than the blue-selected image.

Actually reading the ID is a simple matter. Since the ID occurs between the first and second stripe, the midpoint between these two stripes is calculated. Then all the pixels (in the white-selected image) within a proportion of the stripe width (50%) and between the midpoint and the first stripe are averaged, and the result thresholded to get the most significant bit. Similarly, the pixels below the midpoint and above the second stripe are averaged and thresholded to get the least significant bit. These two bits are interpreted as a 2-bit binary number to produce the post ID.

This binary encoding was our second design, although in retrospect it is both the obvious design and the better one. Our first design was to encode the ID as simply a number of stripes, so that post ID 1 corresponded to 1 stripe, post ID 2 to 2 stripes and so on. The stripe-number encoding was extremely error prone, since it relied on scanning through the midline and counting the number of edges seen. Thus, any single-pixel noise or bands of noise that occured within the midline would result in erroneous readings. In contrast, the binary encoding is extremely robust, since the noise must be severe enough to compromise more than half the pixels in averaging region to flip a bit and cause a post ID reading error. This can occur, however, especially when reading the ID of distant posts, where there are fewer pixels in the ID region and hence less noise required to cause an error.

The distance to a post is much simpler to read: first, the height of the post (in pixels) is calculated as just the difference of the bottom y-coordinate of the bottom element of the stack and the top y-coordinate of the top element of the stack. Next, in the regular projective transformation (which we assume the image from the camera is) the apparent height of an object varies inversely with its distance. That is, h' = kh/d where k is some constant. By solving for k we find that k = dh'/h; we can then find k empirically by measuring the apparent height of an object with a known height at a known distance. Once we have calculated k (which only needs to be done once for a camera and a set of undistort parameters), we can find the distance to a post based on its apparent height by: d = kh/h'. In practice, we found the value kh for a post since we are only measuring the distance to posts, in which case the distance becomes d = (kh)/h' and kh is found by (kh) = h'd, so only the apparent height and actual distance need to be known.

The angular position of the post relative to direction the robot is facing is similarly easy to calculate. Assume that we know the relative angular direction of the camera to the facing of the robot, and this value is labled phi. Then, given that we have identified a post in the image at x-coordinate (in the image) of x, we wish to calculate the relative angular position of the post to the direction the camera is facing. [Note: we measure x-coordinates relative to the center of the image, so, for instance, in a 320x240 image, the x-coordinate -160 would correspond to the leftmost position in the image, and 160 to the rightmost position, and 0 dead in the center]. In the standard projective transform, we can set up a triangle where x is the x-coordinate, theta is the angle to the post, and F is some focal length.

We then notice that tan(theta) = x/F, from the definition of tangent. Now, if we knew F, we could calculate the angular position of the post by theta = arctan(x/F). Luckily, F can be calculated empirically by measuring the angular position and apparent x-coordinate of an object, and then calculating F = x/tan(theta).

Finally, the angular position of a post relative to the facing of the robot is just given by theta+phi [Note that this value is measured *clockwise* from the robot's facing].

The gray code is read in a different way: once the region of the gray-code has been identified, the midpoints (along a vertical line) of the gray code bits are calculated. Then single pixels from each of those midpoints are sampled and taken to be the bits of the gray code. No averaging can be done horizontally, since the gray code changes with the horizontal position (we only want to read the gray code bits at the center of the band; i.e., facing directly towards the camera). However, a more robust approach would average vertically over the entire gray code bit band height.

In this screenshot, components have bounding boxes drawn, and matched bounding boxes are connected with white lines. The ID of a post is given in green text after the letter P; the distance to the post is given after the letter D, and is expressed in cm * 100 (e.g, 7154 is 71.54cm). The gray code value (out of 128) is the large number in green.

Localization with Monte Carlo

To better integrate with our vision system, we reimplemented the Monte Carlo localization routine in Java. The implementation was similar in spirit to our original Python implementation, but customized for the environment in which we localizing. To determine the relative weights of the particles, the MCL routine takes four inputs from the vision system: which post was seen, the bearing relative to the front of the robot at which the post was seen, the absolute angle determined from the Gray code on the post, and the distance of the post from the camera. Each of these inputs has a different empirically determined level of reliability. In our current vision system, the post identification is most reliable since each post is labeled with a unique 2 bit binary value which is relatively easy to determine. After this, distance measurements were fairly reliable, typically giving distance estimates accurate to within a few centimeters. Bearing measurements were somewhat problematic due to the difficulty of determining the angle of the post from its position in the camera's field of view. Absolute angle measurements were especially unreliable, as we could only reliably read the first few bits of the Gray code.

To account for the different types of readings when estimating particle probabilities, we compared the observed values with the values each particle should have seen when looking at any of the posts in our map. Determining the expected values is simple because the map consists of only 4 posts laid out on predefined locations on a rectangle. The differences between these values were used to calculate weights for the likelihood of the robot being at the location of each particle through the following basic method:
1) Initialize the particle's weight at 0
2) Find the difference (call it delta) between the measured and expected values for post we think we've seen.
3) Normalize delta to be approximately between 0 and 1, where 0 indicates perfect agreement with expectations, and 1 indicates complete disagreement.
4) For each delta, compute C / (1/C + delta) and add it to the particle's weight, where C is a parameter indicating how confident we are in the given measurement. This parameter is relatively high for distance measurements and low for absolute angle measurements, for example.
5) Perform the same calculation over the remaining posts that we don't think we saw, but only add a fraction of these values to the particle's total weight.
6) Repeat for all particles, normalize their weights, and cull and resample the population.

Our model for modeling robot odometry was simple and based on the odometry that we obtained from the robot, namely delta distance and delta theta. We used the model from class of turning, then driving, then turning again to get to the final point. We found that both the first and second turns were equal to one half of delta turn and that the distance was d_out = (d_in * sin(theta)) / (theta * sin((pi - theta)/2)).

Direct Localization By Posts

For calculating the position of the Roomba, we only use the distance to the posts. Knowing the distance to a post, as well as the post's location (we know the locations, and since the posts are identifiable by their unique IDs, we know which post we are looking at) narrows down the location of the Roomba to a circle of points equidistant from the post. If a second post-distance pair is known, the location is narrowed down to the intersection of two circles: at most two points. In general, a third post-distance pair is needed to exactly localize the Roomba. In principle, this localization can be done analytically by solving a system of non-linear equations. In practice, we treat the problem as a minimization problem. For a single post observed to be at a distance of d and known to be at a position of (x_p, y_p), the error function for a point (x,y) is given by E_p = (d - sqrt((x - x_p)^2 + (y - y_p)^2)); that is, it is the square of the difference between the expected distance d to the post and the actual distance to the post. For n posts, the error function is simply the sum of the single-post error functions. Then the point which is most consistent with the observed distances will have the smallest error. We minimize this error function with a very naive search algorithm (a recursive uniform sampling), but which nonetheless works very well, since the search only has to be done once for a set of observations from a location.

Tests involving synthetic data suggest that for four posts each of which has the distance measured with an error of +/- 10cm, the error in the location is also approximately +/- 10cm.

The orientation of the Roomba can also be calculated after the Roomba's position has been estimated. Suppose the Roomba sees a post at an angle of phi clockwise from the direction it is facing, and that, based on its estimated position, the absolute angle (measured counterclockwise from due east) of the post relative to the Roomba is theta.

The Roomba's orientation, as measured counterclockwise from east, is given by theta+phi.

Physically, to see enough posts the Roomba must rotate the camera 180, turn around 180 and repeat the viewing process to see the posts behind the Roomba, after which it rotates another 180 to return to its starting orientation. If the Roomba does not see enough posts in the process (it needs to see at least three), then it rotates one third of a full rotation and repeats the viewing process in the hopes of seeing more posts (it keeps the posts it has already seen in memory). When the Roomba moves it discards the posts it has seen, since those observations no longer correspond to the current position.

Results / Conclusion

Here is a video of our robot using direct localization:

Although the MCL routine worked well for simulated data, it did not perform as well on the hardware. The direct localization method of rotating at one point and observing as many posts as possible, then deducing a location geometrically from the distance to the posts, achieved much better results in practice. A possible reason for this is the unreliability of several of our measurements. In the future, we may move to an off-board camera and observe the robot directly to overcome these issues.

As can be seen from the video, in general the direct localization is good, since it determines the Roomba's position within about 10cm and the Roomba's orientation within +/- 10. However, for the purposes of drawing pictures, 10cm is not nearly good enough to properly align parts of a drawing, and a 10 error in pose will produce severly crooked lines. Additionally, the small field of view of the camera meant that, ironically, once the Roomba got sufficiently close to a post it could no longer see enough of it to properly identify it, and the post diagonal to it was then too far away to be identified. Since the Roomba could then only identify two posts, it would get stuck in a loop of searching for more posts. Finally, the slow speed of this localization (which can be seen in the video, which is only one localization) is the colloquial nail in the coffin for on-board camera based localization; to draw a picture with any sort of complexity will require at most 1cm error and better than 1s localization times.

Theoretically, our algorithm for robot movement was to run as follows. First, wander around a bit to find out where the robot is with MCL. Then, commence the drawing of the lines. To draw a line, first attempt to drive to the beginning of the line. Once you have driven the correct distance, check your position against the beginning of the line. If you're not close enough, iteratively head towards the beginning of the line until you are. Once you are close enough to the beginning of the line, rotate yourself until you are facing the correct distance within some tolerance. Once you are oriented correctly, put the pen down and then drive to draw the line. Repeat until you have drawn the entire picture. In practice, drawing did not work well at all. In attempts to have the robot draw a square, no recognizable square was ever produced, even though the individual localizations (through direct localization) as sub-steps of the drawing process were generally correct.

To ease the creation of new pictures for our robot to draw, we created a small python application to draw pictures by clicking on the screen to draw line segments. The coordinates of these lines segments were saved to a text file which could be loaded later. Ideally, this program could factor in the offsets having to do with the pen's displacement from the center of the robot also.

Week 4/5

After tinkering with elecromagnet and permanent-magnet based mechanisms to pick up and drop washers, we have (for the moment) abandoned the washer idea, and instead we have decided to simply have the robot draw with markers. The reason is that we could get absolutely no precision in the dropping, and the magnets had the tendency to pick up nearby washers as well.

Based on the frustration of having to carry around a laptop behind the robot, we finally mounted the laptop on the robot with velcro. Additionally, we built the mechanism to lower and lift a pen or marker to allow the robot to draw. Finally, we mounted the camera on a pan/tilt servo assembly to provide the robot with the capability of looking at both the ground (the image it is drawing) and its surroundings (localization). Additionally, we wrote the servo control software.

Based on the ideas of ______ we decided to use an 'inverted' form of her tracking-beacon based localization system. Rather than having a post mounted on the robot, we will place three or more posts around the environment; each post will have a code uniquely identifiying it, as well as possibly a gray-code to indicate the angle from which it is being viewed. Combined with the distance information provided by the apparent size of the post, four different localization methods can be used.

First, with only one post, the distance and gray-code are sufficient to localize relative to that post; if the locations of all three posts are known a priori, each additional post can be used to confirm the localization estimate from all the other posts.

Next, if the pose (that is, the angle of the robot relative to the 'facing') of each post is known through the gray-code, then any two posts can be used to localize the robot by simply taking the intersection of the two lines originating from the posts and oriented in the angles indicated by the gray codes.

With three posts, knowing only the distance to each is sufficient to uniquely identify the location of the robot. In fact, with only two posts the number of possible locations can be narrowed down to only two, and only one of those can be within the triangle formed by the posts. Therefore, if the robot's motion is restricted to within the triangle, three posts provides three independent estimates of the location.

Finally, if the robot is confined to the triangle, it is not necessary to know the distance or pose of any of the posts, only the relative angular positions of the posts. This only provides one location estimate, however.

Towards these goals, we have worked on vision based identification of posts based on color. In our current test implementation, we first convert into HSV color space and select colors that are 'sufficiently close' to a sample that the vision system is calibrated to. Sufficiently close means that a histogram of HSV color frequencies is built up from the sample image, and the HSV range in which 90% of the samples fall is chosen as the selection range. This works reasonably well, but there is still significant noise to worry about. Additionally, when trying to identify the blue tape on the post, the selection only catches the edges of the tape for reasons we do not understand.

However, 'back of the envelope' calculations indicate that just based on the image size of an approximately 30cm tall id code on a post, a distance uncertainty on the order of 1cm will be possible within close distances (3m). Hopefully, a combination of all the localization methods described will be able to reduce this uncertainty, and additionally the image of the floor could be used with Monte-Carlo localization to get greater precision.

Week 3 (February 7, 2007)

This week we got the robot to act based on information it received from the camera. Specifically, we got it to push a roll of blue tape around.

The first step was getting the robot to just follow 'light' around, by which we mean that the robot would compare the average brightness (sum of r, g, and b components) of the left and right halves of its camera input and steer to equalize them. Additionally, we programmed the robot to back up a few inches and rotate about thirty degrees clockwise whenever it hit an obstacle (its bump sensor was activated).

In the halls of Case, where the walls are not white, it was quite amusing to watch the robot steer towards the bright light of the windows. In Atwood, however, this technique did not work at all because the walls are painted with glossy white paint.

We next tried restricting the robot's view to the floor in the software by cropping out the top half of the camera input; that worked well as long as the robot did not get too close to the walls, but when it did, the blindingly white walls would enter its field of view and distract it from everything else.

Finally, we found an extremely blue roll of tape and made the robot follow only blue colors (we identified blue colors by the difference between the blue and red channels in the image); the success of this technique can be seen in the posted video. However, the robot's extremely restricted field of view (due mainly to poor mounting) meant that as soon as the roll of tape drifted even slightly to the left or right, it would leave the robot's field of vision and it would charge blindly forward.

Here is a video of our pushing success:

Here is a failed attempt:

Here's a picture of our servo-mounted camera (which we probably won't use). We can move the servos with the comptuer though.

Week 2 (January 31, 2007)

We first went to the hardware store and bought spray paint, ten washers, and a battery. We need more washers and probably more colors of spray paint later.

We learned how to drive our actuators (wheels, speaker, LEDS, digital outputs, and low side driver outputs) from a computer. We were able to move the wheels, change the on-board LED, and drive electronics on the digital driver outputs. We have had a lot of trouble with the low side driver outputs. We also learned how to read from the web cam and the vehicle. We have yet to create a nice interface for the input and output to the robot.


Brooks, Rodney. Achieving Artificial Intelligence Through Building Robots. MIT. 1986.
Vaughan Richard, et al. Experiments in Automatic Flock Control. Robotics and Autonomous Systems. 1999.
Yim, M, et al. PolyBot: a Modular Reconfigurable Robot. Proceedings, International Conference on Robotics and Automation, San Fransisco, CA, April 2000, pp 514-520.


This video was the inspiration for our project:


Team Members

Pyry Matikainen
Craig Weidert
Kevin Zielnicki