Claremont Graduate University
Data Science Programming (IST 380)
Spring 2013



Welcome

 

This is the home page of CGU's Data Science Programming seminar -- IST380. This is a new offering and will include seminar-style and lab-based sessions in order to develop skills and insights into the emerging field of "data science."

This page's URL http://www.cs.hmc.edu/~dodds/IST380/

Submission Page URL http://www.cs.hmc.edu/~submissions/

Meetings    Mondays 7:00 - 9:50 pm    Academic Computing Building (ACB) 219

Course Details, Grading, etc.    course syllabus available here

Z. Dodds's HMC homepage http://www.cs.hmc.edu/~dodds



Course materials:

 

      Week             Assignment            Class Slides           Text Chapter     




4/22/2013 Week 11: Wrap-up talks, example and guidelines Lecture 11:
Example analysis
You've finished the Data Science book!
4/15/2013 Week 10: Model and feature selection Lecture 10:
Selection!
You've finished the Data Science book!
4/8/2013 Week 9: Time-series analysis Lecture 9:
Time!
You've finished the Data Science book!
4/1/2013 Week 8: NNs, SVMs, and k-NNs: Ms and Ns! Lecture 8:
NNs, SVMs, and kNNs
Data Science
Chapter 15
(pp. 145-156)
3/25/2013 Week 7: Unsupervised learning by clustering Lecture 7:
Clustering
Data Science
Chapter 14
(pp. 129-144)
3/11/2013 Week 6: Random Forests and other topics... Lecture 6:
Random Forests
Data Science
Chapter 13
(pp. 121-128)
3/4/2013 Week 5: From regression to trees... Lecture 5:
Strings and trees!
Data Science
Chapter 12
(pp. 105-120)
2/18/2013 Week 4: Plotting and Factoring with R Lecture 4:
Factors and plots in R
Data Science
Chapter 11
(pp. 89-104)
2/11/2013 Week 3: Warming up with regression! Lecture 3:
Twitter, envelopes, and linear regression
Data Science
Chapter 10
(pp. 75-88)
2/4/2013 Week 2: Functioning in R Lecture 2:
Writing functions in R
Data Science
Chapters 6-9
(pp. 37-74)
1/28/2013 Week 1: Getting started with R Lecture 1:
Welcome!
Data Science
Chapters 1-5
(pp. 1-36)




IST 380: Why? and What?

 

"Data Science" has emerged as an important source of insights that can benefit scientific, business, and even personal goals. The enthusiasm around that title may have become overdone -- for example, the Harvard Business Review described Data Scientist as the "sexiest job of the 21st century."

Despite that hype, Data Science does represent the intersection of several important fields: probability and statistics, machine learning, programming, and computational infrastrusture. IST 380 will developing hands-on skills in the field's primary toolkit, R and will use one of the first texts on Data Science to do so.

Whether you're eager to start applying analysis and predictive algorithms to your own datasets -- or you're just seeking a thorough overview of this emerging field -- my goal is that IST 380 will do exactly that.



IST 380: How?

 

This course provides a programming-based introduction to Data Science by actively involving students in designing and writing programs. Class sessions will start with a presentation on one or more technical facets of the field, e.g., fundamental algorithms, using them in R, and manipulating data sets. The ideas discussed will then be put into practice with supervised, hands-on experience in writing scripts in R in order to solve problems of increasing complexity. Weekly assignments extend this classroom experience, and an open-ended final project offers participants the chance to apply their skills to a problem of their choice.



Where do I go from here ?

 

After IST380, you will be able to add these points to your resume:

  • R, a widely-used statistical environment and the standard tool for data science - along with many of R's features
  • S, the programming language that R implements
  • experience with descriptive statistics, predictive statistics, statistical modeling, and machine learning algorithms
In addition, through the small assignments and the large final project, you should have experience and project deliverables to demonstrate (1) a broad understanding of the interconnected parts of data science and (2) practical skills for exploring and describing data sets.

From there, you might pursue the field further, use these skills in projects of professional or personal interest, and/or consider this a capstone tour of data science as you pursue other areas of information science and technology.