Lab 3: House Price Regression

Introduction

In this lab, you'll be predicting house prices given a number of features like square footage, lot size, etc.

Data

This will be run as a Kaggle in-class competition. You'll need to use this invitation link, create a Kaggle account and accept the competition rules before you download the dataset from Kaggle.

Kaggle hosts machine learning competitions (often with cash prizes). Right now, there are nine competitions running with prizes from $10K to $100K for winners. We'll be using a private competition (with no money prizes:) just for our class. You download the training examples from Kaggle, which include all the features including the SalePrice. For a test set, they provide all the features except the SalePrice.

After you've trained your model, you'll make predictions for each of the instances in the test set. You'll create a CSV file with an Id number and the predicted SalePrice and upload it to Kaggle. Kaggle then evaluates the loss on a subset of those predictions and puts your submission on a public leaderboard ranked by loss. You can resubmit up to twice a day to update your place on the leaderboard.

On March 18th, entries to this competition closes and your best submission is re-evaluated on the remaining set. This last evaluation is a final evaluation that shows how your model generalizes (since you may have submitted enough times on the public leaderboard dataset to have overfit).

Using Kaggle from Colab

You can use the Kaggle API.

First, create an API token (see instructions).

Then, in Colab, install the kaggle API:

! pip install -q kaggle
import os
os.environ['KAGGLE_USERNAME'] = "xxxxxx" # username from the json file
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxx" # key from the json file
!kaggle competitions download  cs152sp21-house-prices-2

Metric

Use a Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

Suggestions

Challenge 1 If your model is among the top 25% of teams on the private leaderboard once the in-class competition ends on March 18th, you've met this challenge.

Challenge 2 Create an ensemble model that includes at least two trained NN models as well as some other ML model like Random Forests.

Challenge 3 Enter your model into the open Kaggle house price regression challenge (note that although the format is the same, the data is different). Get into the top 10% of participants on the leaderboard.

If you do this challenge, make your Kaggle team name be of the form NNSP21-FirstName/FirstName. In addition, make sure your notebook includes everything you did related to this competition, and includes a screenshot of your place on the leaderboard.

This completes the lab. Submit instructions

  1. Make sure that the output of all cells is up-to-date.
  2. Rename your notebook:
    1. Click on notebook name at the top of the window.
    2. Rename to "CS152Sp21Lab3 FirstName1/FirstName2" (using the correct lab number, along with your two first names). I need this naming so I can easily navigate through the large number of shared docs I will have by the end of the semester.
  3. Choose File/Save
  4. Share your notebook with me:
    1. Click on the Share button at the top-right of your notebook.
    2. Enter rhodes@g.hmc.edu as the email address.
    3. Click the pencil icon and select Can comment.
    4. Click on Done.
  5. Enter the URL of your colab notebook in this submittal form. Do not copy the URL from the address bar (which may contain an authuser parameter and which I will not be able to open). Instead, click Share and Copy link to obtain the correct link. Enter your names in alphabetical order.
  6. At this point, you and I will go back and forth until the lab is approved.
    1. I will provide inline comments as I evaluate the submission (Google should notify you of these comments via email).
    2. You will then need to address those comments. Please do not resolve or delete the comments. I will use them as a record of our conversation. You can respond to them ("Fixed" perhaps).
    3. Once you have addressed all the comments in this round, fill out the submittal form again.
    4. Once I am completely satisifed with your lab, I will add a LGTM (Looks Good to Me) comment
    5. At that point, setup an office hour appointment with me. Ill meet with you and your partner and we'll have a short discussiona about the lab. Both of you should be able to answer questions about any part of the lab.

'