I will update the notes as the course progresses. The lecture notes may help you remember the lecture content, but they are not a replacement for attending lectures.

I'll record lectures and post a link to them, but keep in mind that sometimes technical difficulties cause problems with those recordings.

Solutions are only available to students registered for this course (members of the course Google Group).
MonTue WedThuFri
Jan 20
MLK Day
Jan 21 Jan 22
Lec 1: Intro Lecture (Video)
Reading: 1.1-1.6
HW 1 assigned: Textbook exercises 1.2, 1.3, 1.4
Jan 23 Jan 24
Jan 27
Lec 2: Multi-armed bandits (Video)
Reading: 2.1-2.4
Handout: Bandit application
HW 1 due 1 PM (solution)
Quiz 1 in class (solution)
Jan 28 Jan 29
Lec 3: Multi-armed bandits (Video)
Reading: 2.5-2.7, 2.9-2.10
PA 1 assigned
HW 2 assigned: Textbook exercises 2.1, 2.2, 2.3
Jan 30 Jan 31
Feb 03
Lec 4: Finite MDP (Video)
Reading: 3.1-3.4
Handout: MDP application
HW 2 due 1 PM (solution)
Quiz 2 in class (solution)
Feb 04 Feb 05
Lec 5: Finite MDP: Policy and Value functions (Video)
Reading: 3.5-3.8
HW 3 assigned: Textbook exercises 2.8, 3.3, 3.4, 3.8, 3.9, 3.14, 3.15, 3.16
Feb 06 Feb 07
PA 1 due 11 PM (solution)
Feb 10
Lec 6: Dynamic Programming (Video)
Reading: 4.1-4.3
PA 2 assigned
HW 3 due 1 PM (solution)
Quiz 3 in class (solution)
Feb 11
Office hours: 1-4 PM instead of 8:30-11:30 AM
Feb 12
Lec 7: Dynamic Programming (Video)
Reading: 4.4-4.8
Handout:
HW 4 assigned: Textbook exercises 4.2, 4.3, 4.5, 4.8, 4.10
Feb 13 Feb 14
Feb 17
Lec 8: Monte Carlo Methods (Video)
Reading: 5.1-5.4
HW 4 due 1 PM (solution)
Quiz 4 in class (solution)
Feb 18
PA 2 due 11 PM (solution)
Office hours: 1-4 PM instead of 8:30-11:30 AM
Feb 19
Lec 9: Monte Carlo Methods: Off-policy (Video)
Reading: 5.5-5.7,5.9
PA 3 assigned
HW 5 assigned: Textbook exercises 5.2, 5.3, 5.5, 5.7, 5.8, 5.11, 5.13
Feb 20 Feb 21
Feb 24
Lec 10: Temporal Difference (TD) Learning (Video)
Reading: 6.1-6.2
HW 5 due 1 PM (solution)
Quiz 5 in class (solution)
Feb 25
Office hours 8:30-11AM instead of 8:30-11:30 AM
Feb 26
Lec 11: TD 2 (Video)
Reading: 6.3-6.4
Midterm 1 handed out
Feb 27 Feb 28
PA 3 due 11 PM (solution)
Mar 02
Lec 12: TD 3: Off-policy (Video)
Reading: 6.5-6.9
Handout:
Midterm 1 (solution) due at the beginning of class
PA 4 assigned
Mar 03 Mar 04
Lec 13: n-step Bootstrapping (Video)
Reading: 7.1-7.2
HW 6 assigned: Textbook exercises 6.2, 6.3, 6.7, 6.12, 7.4
Mar 05 Mar 06
Mar 09
Lec 14: n-step Bootstrapping: Off-policy (Video)
Reading: 7.3-7.4
HW 6 due 1 PM (solution)
Quiz 6 in class (solution)
Mar 10
PA 4 due 11 PM
Mar 11
Lec 15: Optimizing your Financial Life (Video)
Handout: Lecture15-OptimizeYourEngineeringLife.pdf
PA 5 assigned
HW 7 assigned (pdf, TeX)
Mar 12 Mar 13
Mar 16
Spring break
Mar 17
Spring break
Mar 18
Spring break
Mar 19
Spring break
Mar 20
Spring break
Mar 23
Spring break
Mar 24
Spring break
Mar 25
Spring break
Mar 26
Spring break
Mar 27
Spring break
Mar 30
Lec 16: n-step Bootstrapping: Off-policy (Slides, Edited slides, Video)
Reading: 7.5-7.7
HW 7 due 1 PM (solution)
Quiz 7 on gradescope (solution)
Mar 31
Office hours 8:30-9AM, 10-11AM
Apr 01
Lec 17: Planning & Learning with Tabular Methods (Slides, Edited slides, Video)
Reading: 8.1-8.4
HW 8 assigned: Textbook exercises 8.1, 8.2, 8.5
Apr 02 Apr 03
PA 5 due 11 PM
Office hours 1PM-2PM
Apr 06
Lec 18: Planning & Learning with Tabular Methods (Slides, Edited slides)
Reading: 8.5-8.6,8.8
Instructor error: no video was recorded
HW 8 due 1 PM (solution)
Quiz 8 on gradescope (solution)
Apr 07 Apr 08
Lec 19: Planning & Learning with Tabular Methods (Slides, Edited slides, Video)
Reading: 8.9-8.13
PA 6 assigned
HW 9 assigned (pdf, TeX)
Apr 09 Apr 10
Apr 13
Lec 20: Midterm 2 review (Slides, Edited slides, Video)
Reading: Chapters 6-8
Miderm 2 handed out
Midterm 2 pdf
HW 9 due 1 PM (solution)
Apr 14 Apr 15
Lec 21: On-Policy Prediction with Approximation (Slides, Edited slides, Video)
Reading: 9.1-9.4
Handout: Gradient Descent Colab Notebook
Miderm 2 (solution) PDF available 4/13/20, due on Gradescope at the beginning of class 4/15/20
HW 10 assigned (pdf, TeX)
Apr 16 Apr 17
PA 6 due 11 PM
Apr 20
Lec 22: On-Policy Prediction with Approximation (Slides, Edited slides, Video)
HW 10 due 1 PM
Quiz 9 on gradescope
Apr 21 Apr 22
Lec 23: Guest Lecture: ErinTalvitie---Model-based learning (Video)
PA 7 assigned
HW 11 assigned (pdf)
Apr 23 Apr 24
Apr 27
Lec 24: MCTS Review/Alpha Go (Slides, Edited slides, Video)
Reading: 16.6
HW 11 due 1 PM
Apr 28
Office hours 9AM-10AM
Apr 29
Lec 25: AlphaGo Zero (Slides, Edited slides, Video)
Reading: 16.6
Apr 30
Office Hours 12PM-1PM
May 01
PA 7 due 11 PM
Office hours: 4PM-5PM
May 04 May 05 May 06
Final for all students handed out
Final exam pdf
May 07 May 08
May 11 May 12 May 13 May 14 May 15
Final for all students PDF available 5/6/2020, due on Gradescope 5/15/20 by 5 PM


CS 181V: (Reinforcement Learning) home // Last updated Mon 15 Jun 2020 10:45:08 AM PDT