Image Compression Using a Clustering Algorithm

towards a neural network algorithm for image compression...

Background

For my final project in Neural Networks, I am trying out a couple of methods to compress images. The first is a naive sort of method in which I use a standard neural network with backpropagation. It does not work very well. The second is a clustering algorithm similar to the k-means and Adaptive Resonance Theory schemes. It does much better than the first method.

In particular, I use 48 by 48 pixel grayscale (8-bit) images. I went for the size of the image because it's relatively small (I can run a lot of tests in not too much time) and I already had a number of 48 by 48 AIM icons. I went for the 8-bit grayscale palette because it's much easier to deal with one desired output than a three channel output vector. It would be not very difficult to extend both of the methods I used to deal with three or even zillion-channel color.

To encode this image as a bitmap type file, you would need 48 * 48 = 2304 bytes (plus a little overhead for file business, image size and format, etc). In standard bitmap format, these files were 3,382 bytes. With JPEG compression, the size of the files ranged from 663 bytes (tabula rasa) to 1,913 bytes (seemingly random static).

All of my code is written in Python for rapid prototyping and ease of implementation. As such, it runs somewhat slowly. If you were concerned about speed, then you could write it in something faster. I'm running my code on a computer with a Pentium 4 2.8 GHz processor and 512MB of RAM. I'm running Windows XP.

Attempt #1 - The Failure

The Gameplan:

I will use a standard neural network with backprop training. First, I will create some image that I would like to compress. I will then run various pixels through a standard neural network (I used two hidden layers of 10 logsig functions apiece) with the inputs being the x and y coordinates and the desired output being the greyscale value. Theoretically, if I wanted to save the image, I would just store the weights of all of the neurons. Clearly, this is an ill-conceived plan as there are many many weights. To restore the image, I would load the weights and run all of the coordinates of the image through the network, recording the outputs and setting the pixels of the image as appropriate.

The Algorithm:

This algorithm was a direct implementation of what you would expect from the gameplan using a standard backprop network.

The Results:

The results for this method were plagued with ills. The most troubling of these was the fact that this method seemed to be worthless for most of the images that I would try to compress. On most of the images on which I tried to train the network, the network would quickly settle for an image that was pure gray to minimize the mean squared error. I believe that it was stuck in local minima. As advantageous as this was (at least in the short run) for the network, this made the method pretty much worthless as a image compression scheme. Furthermore, for some of the images the method would work only half of the time and gray out the other half of the time.

Even on the images on which the method ``worked,'' the resulting images were not very faithful to the originals. The images would often be extremely distorted in the lower right hand side of the restored image. I thought that this is because the smaller integers are much farther apart multiplicatively than the larger numbers are. For instance, from pixel 2 to 3 is a 50% increase, but from pixel 47 to 48 is a mere 2% increase. However, I tried exponentially parameterizing the xy co-ordinates and that didn't seem to help very much.

Another fundamental problem with this method was that it did not vary the compression level per image. That is, something that was pure white and something that was static would be compressed to the same amount of data. To me it seems that this is a bad stragety because

The most annoying problem with this algorithm was that it was immensely slow. Most of my runs would take at least an hour and to get a good batch of results, I would have to run tests overnight. Anyone wanting to compress an image will not wait for an hour or more to get the job done (especially if it's a 48 by 48 pixel grayscale image).

However, not all was dim with this algorithm. It did yield a number of cool gifs of the algorithm slowly struggling with faithfully reproducing the original image. Here are a couple. Each of the first two are the product of about 30 minutes of computation time. They remind me of a portion of the original Fantasia graphics. The third is an example of what happens in most cases when you try to train.

Original Image	Algorithm Chugging Along

Attempt #2 - The Relative Success

The Gameplan:

I will use a neural network similar to an k-means or Adaptive Resonance Theory network. First, I will create some image that I would like to compress. Secondly, I will use a network to cluster the pixels of the image into a bunch of clusters. Then, I will (in theory) save the clusters, though in practice I just see how many of them there are and how much space it would take to hold them. Then, to recover the image, I will run through all of the possible pixels and, for each pixel, find the closest cluster (based on the xy co-ordinates) and use the color of that pixel.

The Algorithm:

To Compress the Image:

Load the greyscale (8 bit) image that you want to compress
From this image, get a sample from each pixel of the form (x, y, c) where
- x and y are the normalized (on a 0-1 scale) coordinates of the pixel and
- c is the normalized (again on a 0-1 scale) greyscale value
Set some tolerance
Create an empty list of prototypes
Iterate over all of the samples (randomizing the samples seems to help somewhat) for training
- If there is not yet a prototype, make this sample a prototype and break
- Calculate the euclidean distance from the sample to each existing prototype where the coordinates are the (x, y, c)
- If there is a prototype that is with in a tolerable distance of the sample, then conglomerate it with that prototype. That is, move the prototype so that all three of the values of the prototype are the average of the samples it conglomerates.
- If there is no prototype within a tolerable distance make a new prototype out of the sample.
Repeat the last step if desired. In practice, this doesn't have very much effect on the prototypes.
Save the resulting prototypes somewhere or other. On this step, you will likely want to truncate your values in your prototypes after a certain point. For instance, if you are only working with a 48 by 48 pixel image, there is no way that you need the 32-bit precision that doubles afford you. You will also probably need to store some bytes of overhead for image size, etc.

To Decompress the Image:

Grab the resulting prototypes from somewhere or other.
Iterate through all of the positions of the image.
- For each position on the xy-plane, iterate through the prototypes to see which has the closest xy coordinates. For very large numbers of prototypes, this could take a while. Thus in an actual implementation in which you were concerned about speed, you might speed the process up with a sort of quad-tree.
- Take the winning prototype and fill in that pixel with the color of the prototype.

The Results:

Compared to the first method, this method was a relative success. I ran this algorithm on a number of grayscale images, both basic and photographic. The algorithm even runs fairly quickly. Using my python implementation, it takes about one minute to run through all twelve of the runs at various tolerances for a given picture. This includes reconstructing the image and saving a couple of copies of the image for analysis. The slowest run by far is the run of .025 tolerance which takes about 35 seconds. Of this time, about 10 seconds are for finding the clusters and and about 25 seconds are for reconstruction and saving.

The following table gives a visual comparison between the original images and the various images produced based on different tolerances. In this table, ``t'' symbolizes the tolerance (the maximum distance that a point (x, y, c) was allowed to be from a prototype without making a new prototype). Underneath each picture is the number of clusters that it took to store that particular version of that image. With these picture parameters (48 by 48, 256 colors), it takes a little less than three bytes (20 bits) to store each cluster. Since in a normal bitmap scheme, each pixel is tracked by a byte, anything under (48*48*8/20 = 921) can be considered a savings of space over a straightforward bitmap scheme. JPEG compression schemes will achieve compression equivalent to around 200 to 700 clusters depending on the complexity of the picture. In the spirit of image compression (as read in my reference paper), rather than use a mean squared error (which doesn't correspond very well to our perception of image faithfulness), I will allow you see for yourself how faithful the algorithm is at different compressions.

Image Name	Goal	t = .025	t = .05	t = .075	t = .1	t = .15	t = .2	t = .25	t = .3	t = .35	t = .4	t = .45	t = .5
Stripes
Clusters	N/A	0576	0192	0099	0064	0032	0020	0012	0008	0008	0008	0008	0006
A
Clusters	N/A	0622	0249	0107	0069	0038	0024	0016	0012	0011	0008	0007	0006
Cross
Clusters	N/A	0792	0349	0195	0141	0074	0045	0027	0020	0017	0013	0012	0009
Target
Clusters	N/A	0576	0193	0100	0057	0030	0020	0015	0011	0010	0008	0007	0006
Diag
Clusters	N/A	0634	0237	0105	0074	0036	0026	0020	0015	0012	0008	0008	0006
Dots
Clusters	N/A	0676	0223	0110	0064	0040	0025	0019	0013	0009	0009	0008	0007
Circle
Clusters	N/A	0676	0229	0119	0073	0040	0027	0021	0012	0009	0008	0006	0005
Stick Dude
Clusters	N/A	0646	0219	0102	0063	0034	0020	0016	0010	0007	0006	0006	0006
Rabbit
Clusters	N/A	1657	0683	0320	0175	0073	0040	0028	0018	0014	0010	0007	0006
Blocks
Clusters	N/A	1376	0561	0283	0161	0069	0039	0024	0017	0013	0011	0007	0006
Tractor
Clusters	N/A	1461	0582	0266	0148	0062	0032	0018	0014	0009	0006	0005	0005
Rice
Clusters	N/A	1766	0735	0325	0161	0069	0034	0021	0014	0010	0008	0007	0005
Think
Clusters	N/A	1446	0498	0218	0119	0047	0030	0018	0012	0008	0006	0004	0004
Ghosts
Clusters	N/A	1409	0538	0262	0143	0066	0031	0023	0015	0010	0008	0007	0005

As you can see from the table, for most images with a wide variety of rapidly changing colors, in order to achieve a relative amount of faithfulness in your picture, you must approach the point at which the savings over a bitmap scheme drops to zero. For some applications in which only the rough idea However, for simple pictures, (those with small number of colors or colors grouped together), there is a substantial savings. This of course makes sense the pictures stored in the 8-bit format don't take advantage of the fact that there are only a few colors. If they did, they would win out in most cases.

It seems to me that most of the potential of this algorithm lies in its ability to achieve a cool mosaic effect.

Future Work

For future work, I would like to test the method on pictures of more varying size. Since all of my co-ordinates are normalized, this would be equivalent roughly to decreasing the tolerance level. We have the relation that graininess = toleranceLevel * imageSize. There is the trouble that this algorithm runs in O(n^3) time. As the image size goes up, in the worst case (when dealing with random static images), the number of prototypes will also skyrocket and the method will slow to a crawl.

I would also like to use different weightings for the different components of the sample vector. For instance, I might weigh the color component more heavily than the spatial components. Perhaps I could even use an entirely different metric when comparing the samples to the prototypes (like MSE, for example).

A final possibility for future work is modifying the core of the method itself and shifting to some other algorithm, perhaps one that is more faithful to the original ART variations.

Notes

I tried a method where I iterated through the samples several times, each time weeding out the least popular prototypes. It seemed to have the same effect as iterating through the samples only once.

Links, Source, &c

Links

My Original Proposal
My Final Presentation
Neural Network Approaches to Image Compression, a good (though a bit dated) resource for general information by Robert D. Dony and Simon Haykin.

Source

A program to run my second modified ART method. Relies on the pic wrapper.
A program to run my backprop neural net. Relies on numpy (for matrices) and psyco (for ``speed''), random math class, the neural nets file, and the pic wrapper.
A pic wrapper for the Python image class that I used. Includes a method to make 24-bit bitmaps to 8-bit grayscale bitmaps.
A random math class that contained a few functions that I needed for my backprop neural net.
A neural nets file to hold all of my backprop neural net mumbo jumbo.

Pics

The original test images.
The method one failures.
The results from method two.

Contact: Craig Weidert - cweidert at hmc dot edu
This page was created for Professor Keller's Fall 2006 class CS 152 - Neural Networks at Harvey Mudd College in Claremont, CA.