This homework assignment is due at 12 AM on Thursday, March 29, 2012 (i.e., the Wednesday/Thursday boundary). Please give your solutions to me, slip them under my door, or e-mail them.
I expect that it will take you about 4 hours to complete the assignment. Please record your actual time so I can get feedback on my estimates.
There is only one problem in this assignment: graphing data. The file prob1.txt contains observations of file sizes on a specific computer. The tab-separated fields are as follows:
717 0 ---indicates that 717 files were observed that had no extension and a length of zero, while:
28 35 .gifshows us that there were 28 files that were 35 bytes long and with a .gif extension.
Prepare a graphical presentation, containing between two and four graphs, that tells a story about this data. Use the principles of graphical presentation that were discussed in class or that are given in Tufte and Jain. 80% of your grade will be based on the quality of presentation; only 20% will depend on your analysis and insights.
You may assume that, if I print your results, I will use a color printer.
To help you with your analysis, here are some things you might experiment with (individually or in combination):
You are not limited to the above ideas. Feel free to be creative.
WARNING: When the data is expanded into one line per file, there are approximately 82,000 lines of data. This is too much to handle with common spreadsheet programs, which often impose an arbitrary limit of 65,535 rows. You should view this homework assignment partly as an opportunity to find a statistical package that can handle large amounts of data, or to develop techniques (such as placing the data into multiple columns) that will work around these unreasonable restrictions.
© 2012, Geoff Kuenning
This page is maintained by Geoff Kuenning.