This homework assignment is due at 12 AM on Thursday, February 16, 2012 (i.e., the Wednesday/Thursday boundary). Please give your solutions to me, slide them under my door, or e-mail them.

I expect that it will take you about 3 hours to complete the assignment.

If you use a Microsoft product to do your graphing, be sure to turn off the stupid gray background, and ensure that color isn't essential for interpreting the graphs (since I might decide to print things on a B&W printer).

You are encouraged to either use a standard software tool or to write code to help solve these problems, so that you will have techniques you can use again in the future.

- The sizes (in bytes) of a set of HTML files
are given in
prob1.txt.
- What are the 1st and 3rd quartiles for this data?
- Are quartiles a good choice to describe dispersion in this case? If not, what would you use instead?
- What are the mean file size and the standard deviation?
- What are the 90% and 95% confidence intervals for the mean?
- Is your calculation valid? Why or why not?
- What is the 90% confidence interval for the proportion of files that are less than 20,000 bytes in size? Use the formula given in Jain.
- Is the confidence interval for the proportion valid?
- At 90% confidence, is the mean file size greater than 16K (16384) bytes?

- The raw midterm scores for two sections of a class are
given in prob2-1.txt and prob2-2.txt.
- Is either section better than the other at 90% confidence? Which?
- Is either section better at 80% confidence?
- Based on the data from the combined sections, how many students would have to take the midterm if we wanted the mean score to have a 99% confidence interval that was +/- 5% of the mean?

- Correct timekeeping is very important in navigation.
Traditionally, seafarers try to never reset their
clocks; instead they calculate a daily drift rate and apply a
correction factor. The file prob3.txt
contains a series of observations of the number of days since
a particular wristwatch was first started (first column) and the
number of seconds of error exhibited by the watch relative to
an atomic clock (second column). The columns are
tab-separated, so they should be easy to import into a
spreadsheet.
- Fit a linear regression to this data.
- Which of the fitted parameters are significant at 95% confidence?
- How much of the variation is explained by the regression?
- Based on the R-squared value, is the regression valid? (Show your calculations.)
- Using visual tests, verify or refute the validity of your regression model according to the four criteria listed on pages 235-237 of the textbook.
- What would the clock error be at day 730.000?
- At the equator, a 4-second clock error will
produce a navigation error of exactly one nautical mile. On
day 730, a navigator crossing the equator uses your
regression model to
correct the clock reading, then calculates her position.
At 90% confidence, what is the plus/minus error
introduced by the watch after the correction has been made,
*expressed in nautical miles*?

*© 2012, Geoff Kuenning*

This page is maintained by Geoff Kuenning.