Greek Symbols (reference material)

- For each greek letter: (all lowercase plus upper-case gamma, delta, theta, xi, pi, sigma, upsilon, phi, psi, omega):
- I can provide the name (and case) when given the symbol
- I can provide the symbol given the name (and case)

- For each greek letter: (all lowercase plus upper-case gamma, delta, theta, xi, pi, sigma, upsilon, phi, psi, omega):
NumPy (reference material)

- I can convert a Python array to a NumPy array
- I can use broadcasting to operate on a NumPy array with a NumPy array of fewer dimensions.
- I can do pointwise addition or multiplication of NumPy arrays
- I understand NumPy array shapes:
- I can change the shape of an array
- I can explain the difference between an array of shape (5,), an array of shape (5, 1), and an array of shape (1, 5)

- I can stack two NumPy arrays
- I can use indexing and slicing to extract parts of an array
- I can create a NumPy array with random elements
- I can create a NumPy array with specified datatype.
- I can use NumPy for matrix multiplication.

Gradient Descent (Day 2, Day 3; fast.ai: 149-163)

- I Can explain each of the pieces of the gradient descent loop:
- Theta
- x
- y
- f
- y-hat
- Loss function
- Optimizer

- I Can label a gradient descent loop diagram with each of the pieces
- I can run one iteration of the gradient descent algorithm by hand (given f(x), and the gradient)
- I can explain the difference between full-batch gradient descent, stochastic gradient descent and minibatch gradient descent and can explain the pros and cons of each

- I Can explain each of the pieces of the gradient descent loop:
General ML (Day 2, Day 8; Aggarwal: 1.4.1; fast.ai: 28-30)

- I can identify why overfitting occurs, how it can be identifed, and the ways in which it can be fixed
- I can identify why underfitting occurs, how it can be identified, and the ways in which it can be fixed
- I can explain the use of training, validation and test datasets

Optimizers (Day 8; Aggarwal 3.5.1-3.5.3; fast.ai: 473-480)

- I can explain the use of and give code for the following optimizers
- Plain SGD

- With weight decay
- With momentum
- With Nesterov Momentum
- Adam
- Adamw
- Adagrad
- RMSProp

- Lookahead
- I can explain which optimizers use learning rates and how learning rates are chosen

- I can explain the use of and give code for the following optimizers
Loss functions (fast.ai: 194-203, 226-237)

- I can provide code for the following loss functions and describe when each would be used:
- Cross Entropy
- Mean Squared Error (MSE)
- Binary Cross Entropy
- Negative Log Likelihood (NLL)
- L1 Error

- I can provide code for the following loss functions and describe when each would be used:
Activation functions (Aggarwal 1.2.1.3)

- I can provide the equation for each of the following activation functions, along with the equation for the derivative, and can identify which should be used in a given situation:
- ReLU
- Leaky ReLU
- Tanh
- Softmax
- LogSoftmax
- Sigmoid

- I can provide the equation for each of the following activation functions, along with the equation for the derivative, and can identify which should be used in a given situation:
Neural Networks (Day 10, 11; Aggarwal 1.2.1.3-1.2.3, 1,3, 1.4.2, 3.2, 3.4)

- Given a Neural Network and its input, I can calculate the output
- Given a Neural Network and its input, I can calculate the partial derivative of the loss with respect to any given parameter
- I can show how exploding gradients can occur and methods to address them
- I can show how vanishing gradients can occur and methods to address them

Transfer Learning (Day 16; Aggarwal 8.4.7; fast.ai: fast.ai: 30-33, 207-212)

- Given a pretrained model, I can explain how to repurpose it for a new task, by removing the old head and adding a new head
- I can describe the process of finetuning

Regularization (Day 6, 15; Aggarwal 1.4.1.1, 3.6, 4.4, 4.5.1.2, 4.5.4-4.5.5, 4.6)

- I can identify the use of and implement standard regularization techniques:
- Smaller batchsizes
- Batch normalization
- Dropout
- Weight Decay
- Data Augmentation
- Early Stopping

- Multi-task learning
- Ensembles
- Larger learning rate
- Label smoothing
- Mixup

- I can identify the use of and implement standard regularization techniques:
CNNs (Day 14-17; Aggarwal 3.5.5, 8.1-8.2.6, 8.4; fast.ai: Chapters 13-14)

- I can explain how residual networks work and explain how they can address the vanishing gradient problem
- I can compute the output of a convolutional kernel
- I can use stride and padding to control the output size of a convolutional layer
- I can compute the number of weights in a convolutional layer
- I can compute the output of a max or average or adaptive pooling layer
- I can describe the architectures of ResNet and Inception

RNNS (Day 21-25; Aggarwal 7.2.1-7.2.4, 7.5-7.6; fast.ai: Chapter 12)

- I can explain how RNNs work
- I can compute output from an LSTM or GRU
- I can give the equations for an LSTM or GRU
- TBD

Transformers (Days 26-27)

- TBD