CS 147 Homework Assignment 2

This homework assignment is due at 12 AM on Thursday, February 20, 2003 (i.e., the Wednesday/Thursday boundary). Please give your solutions to me or place them in the box outside my door.

I expect that it will take you about 3 hours to complete the assignment.

If you use a Microsoft product to do your graphing, be sure to turn off all colors (so you can print in black and white) and the stupid gray background.

You are encouraged to either use a standard software tool or to write code to help solve these problems, so that you will have techniques you can use again in the future.

  1. The sizes (in bytes) of the HTML files for the CS70 homework problems from the fall of 2002 are given in prob1.txt.
    1. What are the 1st and 3rd quartiles for this data?
    2. Are quartiles a good choice to describe dispersion in this case? If not, what would you use instead?
    3. What are the mean file size and the standard deviation?
    4. What are the 90% and 95% confidence intervals for the mean?
    5. Is your calculation valid? Why or why not?
    6. What is the 90% confidence interval for the proportion of files that are less than 20,000 bytes in size? Use the formula given in Jain.
    7. Is the confidence interval for the proportion valid?
    8. At 90% confidence, is the mean file size greater than 16K (16384) bytes?
  2. The raw midterm scores for two sections of CS70 students are given in prob2-1.txt and prob2-2.txt.
    1. Is either section better than the other at 90% confidence? Which?
    2. Is either section better at 80% confidence?
    3. Based on the data from the combined sections, how many students would have to take the midterm if we wanted the mean score to have a 99% confidence interval that was +/- 5% of the mean?
  3. Correct timekeeping is very important in navigation. Traditionally, seafarers never try to reset their clocks; instead they calculate a daily drift rate and apply a correction factor. The file prob3.txt contains a series of observations of the number of days since a particular wristwatch was first started (first column) and the number of seconds of error exhibited by the watch relative to an atomic clock (second column). The columns are tab-separated, so they should be easy to import into a spreadsheet.
    1. Fit a linear regression to this data.
    2. Which of the fitted parameters are significant at 95% confidence?
    3. How much of the variation is explained by the regression?
    4. Based on the R-squared value, is the regression valid?
    5. Using visual tests, verify or refute the validity of your regression model according to the four criteria listed on pages 235-237 of the textbook.
    6. What would the clock error be at day 730.000?
    7. At the equator, a 4-second clock error will produce a navigation error of exactly one nautical mile. On day 730, a navigator crossing the equator uses your regression model to correct the clock reading, then calculates her position. At 90% confidence, what is the plus/minus error introduced by the watch after the correction has been made, expressed in nautical miles?


© 2003, Geoff Kuenning

This page is maintained by Geoff Kuenning.