Image Compression Using a Clustering Algorithm

towards a neural network algorithm for image compression...



Background

For my final project in Neural Networks, I am trying out a couple of methods to compress images. The first is a naive sort of method in which I use a standard neural network with backpropagation. It does not work very well. The second is a clustering algorithm similar to the k-means and Adaptive Resonance Theory schemes. It does much better than the first method.

In particular, I use 48 by 48 pixel grayscale (8-bit) images. I went for the size of the image because it's relatively small (I can run a lot of tests in not too much time) and I already had a number of 48 by 48 AIM icons. I went for the 8-bit grayscale palette because it's much easier to deal with one desired output than a three channel output vector. It would be not very difficult to extend both of the methods I used to deal with three or even zillion-channel color.

To encode this image as a bitmap type file, you would need 48 * 48 = 2304 bytes (plus a little overhead for file business, image size and format, etc). In standard bitmap format, these files were 3,382 bytes. With JPEG compression, the size of the files ranged from 663 bytes (tabula rasa) to 1,913 bytes (seemingly random static).

All of my code is written in Python for rapid prototyping and ease of implementation. As such, it runs somewhat slowly. If you were concerned about speed, then you could write it in something faster. I'm running my code on a computer with a Pentium 4 2.8 GHz processor and 512MB of RAM. I'm running Windows XP.


Attempt #1 - The Failure

The Gameplan:

I will use a standard neural network with backprop training. First, I will create some image that I would like to compress. I will then run various pixels through a standard neural network (I used two hidden layers of 10 logsig functions apiece) with the inputs being the x and y coordinates and the desired output being the greyscale value. Theoretically, if I wanted to save the image, I would just store the weights of all of the neurons. Clearly, this is an ill-conceived plan as there are many many weights. To restore the image, I would load the weights and run all of the coordinates of the image through the network, recording the outputs and setting the pixels of the image as appropriate.

The Algorithm:

This algorithm was a direct implementation of what you would expect from the gameplan using a standard backprop network.

The Results:

The results for this method were plagued with ills. The most troubling of these was the fact that this method seemed to be worthless for most of the images that I would try to compress. On most of the images on which I tried to train the network, the network would quickly settle for an image that was pure gray to minimize the mean squared error. I believe that it was stuck in local minima. As advantageous as this was (at least in the short run) for the network, this made the method pretty much worthless as a image compression scheme. Furthermore, for some of the images the method would work only half of the time and gray out the other half of the time.

Even on the images on which the method ``worked,'' the resulting images were not very faithful to the originals. The images would often be extremely distorted in the lower right hand side of the restored image. I thought that this is because the smaller integers are much farther apart multiplicatively than the larger numbers are. For instance, from pixel 2 to 3 is a 50% increase, but from pixel 47 to 48 is a mere 2% increase. However, I tried exponentially parameterizing the xy co-ordinates and that didn't seem to help very much.

Another fundamental problem with this method was that it did not vary the compression level per image. That is, something that was pure white and something that was static would be compressed to the same amount of data. To me it seems that this is a bad stragety because

The most annoying problem with this algorithm was that it was immensely slow. Most of my runs would take at least an hour and to get a good batch of results, I would have to run tests overnight. Anyone wanting to compress an image will not wait for an hour or more to get the job done (especially if it's a 48 by 48 pixel grayscale image).

However, not all was dim with this algorithm. It did yield a number of cool gifs of the algorithm slowly struggling with faithfully reproducing the original image. Here are a couple. Each of the first two are the product of about 30 minutes of computation time. They remind me of a portion of the original Fantasia graphics. The third is an example of what happens in most cases when you try to train.

Original Image Algorithm Chugging Along

Attempt #2 - The Relative Success

The Gameplan:

I will use a neural network similar to an k-means or Adaptive Resonance Theory network. First, I will create some image that I would like to compress. Secondly, I will use a network to cluster the pixels of the image into a bunch of clusters. Then, I will (in theory) save the clusters, though in practice I just see how many of them there are and how much space it would take to hold them. Then, to recover the image, I will run through all of the possible pixels and, for each pixel, find the closest cluster (based on the xy co-ordinates) and use the color of that pixel.

The Algorithm:

To Compress the Image:

To Decompress the Image:

The Results:

Compared to the first method, this method was a relative success. I ran this algorithm on a number of grayscale images, both basic and photographic. The algorithm even runs fairly quickly. Using my python implementation, it takes about one minute to run through all twelve of the runs at various tolerances for a given picture. This includes reconstructing the image and saving a couple of copies of the image for analysis. The slowest run by far is the run of .025 tolerance which takes about 35 seconds. Of this time, about 10 seconds are for finding the clusters and and about 25 seconds are for reconstruction and saving.

The following table gives a visual comparison between the original images and the various images produced based on different tolerances. In this table, ``t'' symbolizes the tolerance (the maximum distance that a point (x, y, c) was allowed to be from a prototype without making a new prototype). Underneath each picture is the number of clusters that it took to store that particular version of that image. With these picture parameters (48 by 48, 256 colors), it takes a little less than three bytes (20 bits) to store each cluster. Since in a normal bitmap scheme, each pixel is tracked by a byte, anything under (48*48*8/20 = 921) can be considered a savings of space over a straightforward bitmap scheme. JPEG compression schemes will achieve compression equivalent to around 200 to 700 clusters depending on the complexity of the picture. In the spirit of image compression (as read in my reference paper), rather than use a mean squared error (which doesn't correspond very well to our perception of image faithfulness), I will allow you see for yourself how faithful the algorithm is at different compressions.

Image Name Goal t = .025 t = .05 t = .075 t = .1 t = .15 t = .2 t = .25 t = .3 t = .35 t = .4 t = .45 t = .5
Stripes
ClustersN/A057601920099006400320020001200080008000800080006
A
ClustersN/A062202490107006900380024001600120011000800070006
Cross
ClustersN/A079203490195014100740045002700200017001300120009
Target
ClustersN/A057601930100005700300020001500110010000800070006
Diag
ClustersN/A063402370105007400360026002000150012000800080006
Dots
ClustersN/A067602230110006400400025001900130009000900080007
Circle
ClustersN/A067602290119007300400027002100120009000800060005
Stick Dude
ClustersN/A064602190102006300340020001600100007000600060006
Rabbit
ClustersN/A165706830320017500730040002800180014001000070006
Blocks
ClustersN/A137605610283016100690039002400170013001100070006
Tractor
ClustersN/A146105820266014800620032001800140009000600050005
Rice
ClustersN/A176607350325016100690034002100140010000800070005
Think
ClustersN/A144604980218011900470030001800120008000600040004
Ghosts
ClustersN/A140905380262014300660031002300150010000800070005

As you can see from the table, for most images with a wide variety of rapidly changing colors, in order to achieve a relative amount of faithfulness in your picture, you must approach the point at which the savings over a bitmap scheme drops to zero. For some applications in which only the rough idea However, for simple pictures, (those with small number of colors or colors grouped together), there is a substantial savings. This of course makes sense the pictures stored in the 8-bit format don't take advantage of the fact that there are only a few colors. If they did, they would win out in most cases.

It seems to me that most of the potential of this algorithm lies in its ability to achieve a cool mosaic effect.

Future Work

For future work, I would like to test the method on pictures of more varying size. Since all of my co-ordinates are normalized, this would be equivalent roughly to decreasing the tolerance level. We have the relation that graininess = toleranceLevel * imageSize. There is the trouble that this algorithm runs in O(n^3) time. As the image size goes up, in the worst case (when dealing with random static images), the number of prototypes will also skyrocket and the method will slow to a crawl.

I would also like to use different weightings for the different components of the sample vector. For instance, I might weigh the color component more heavily than the spatial components. Perhaps I could even use an entirely different metric when comparing the samples to the prototypes (like MSE, for example).

A final possibility for future work is modifying the core of the method itself and shifting to some other algorithm, perhaps one that is more faithful to the original ART variations.

Notes

I tried a method where I iterated through the samples several times, each time weeding out the least popular prototypes. It seemed to have the same effect as iterating through the samples only once.

Links, Source, &c

Links

My Original Proposal
My Final Presentation
Neural Network Approaches to Image Compression, a good (though a bit dated) resource for general information by Robert D. Dony and Simon Haykin.

Source

A program to run my second modified ART method. Relies on the pic wrapper.
A program to run my backprop neural net. Relies on numpy (for matrices) and psyco (for ``speed''), random math class, the neural nets file, and the pic wrapper.
A pic wrapper for the Python image class that I used. Includes a method to make 24-bit bitmaps to 8-bit grayscale bitmaps.
A random math class that contained a few functions that I needed for my backprop neural net.
A neural nets file to hold all of my backprop neural net mumbo jumbo.

Pics

The original test images.
The method one failures.
The results from method two.



Contact: Craig Weidert - cweidert at hmc dot edu
This page was created for Professor Keller's Fall 2006 class CS 152 - Neural Networks at Harvey Mudd College in Claremont, CA.