CS 147 Homework Assignment 3

This homework assignment is due at 12 AM on Monday, March 3, 2003 (i.e., the Sunday/Monday boundary). Please give your solutions to me or place them in the box outside my door.

I expect that it will take you about 3 hours to complete the assignment. Please record your actual time so I can get feedback on my estimates.

If you use a Microsoft product to do your graphing, be sure to turn off all colors (so you can print in black and white) and the stupid gray background.

You are encouraged to either use a standard software tool or to write code to help solve these problems, so that you will have techniques you can use again in the future. Note that Microsoft Excel has an F-test built in. If you use it, you should first do one of the small examples in the book to make sure you get the same result to verify that you know how to use the function properly.

  1. A Harvey Mudd professor with too much time on his hands decided to investigate the relationship between dorm assignments, time spent studying, time spent sleeping, and GPA. He collected data on 495 students.

    Each line of the (tab-separated) file has four fields:

    1. Dorm group: either "quad" or "outer".
    2. Number of hours per week spent studying, as reported by the student.
    3. Average number of (self-reported) hours of sleep the student got each night.
    4. The student's GPA for the semester, as reported by the registrar.

    You are to perform a multiple regression analysis on the data, answering the following questions:

    1. What is the formula for GPA in terms of dorm, study hours, and sleep hours?
    2. What is the R-squared value for this regression?
    3. What are the 95% confidence intervals for each of the four regression parameters?
    4. What are the SSR and the SSE? What is the result of the F-test at the 95% level?
    5. Is there any correlation between the dorm group and the study hours?
    6. Is there any correlation between the study hours and the sleep hours?
    7. Based on your answers to the two previous questions, should you modify your regression analysis? If so, what is the new regression equation, including 95% confidence intervals?
    8. Based on a scatter plot of the regression errors, do you see any trends in the data?
    9. Do you think your regression analysis is valid for this data?
  2. A researcher collected a number of observations relating the memory size of a program (independent variable) to its run time. The data is given in prob3-2.txt, where the first column is the memory size in KB (1K = 1000, not 1024), and the second is the run time in seconds. Using regression techniques, fit an equation to this data. Answer the following questions:
    1. What is the regression equation?
    2. Is your regression valid?
    3. Are the regression parameters significant?
    4. Are your errors normally distributed?
    5. What can you say about the 95% confidence intervals?
    6. What is the predicted run time for a 10-MB program?
    7. What is the predicted run time for a 100-MB program?
    8. How valid are the predictions in the previous two questions?


    © 2003, Geoff Kuenning

    This page is maintained by Geoff Kuenning.