# Overfitting & Underfitting

Overfitting Creating a model that matches the training data so closely that the model fails to make correct predictions on new data.

Underfitting Producing a model with poor predictive ability because the model hasn't captured the complexity of the training data.

Causes:

Training on the wrong set of features.
Training for too few epochs or at too low a learning rate.
Training with too high a regularization rate.
Providing too few hidden layers in a deep neural network.

Overfitting vs Underfitting

We will generally go with complicated models and then apply techniques to prevent overfitting

# Early Stopping

A method for regularization that involves ending model training before training loss finishes decreasing.

In early stopping, you end model training when the loss on a validation data set starts to increase, that is, when generalization performance worsens.

Early Stopping

Model Complexity

# Regularization

The penalty on a model's complexity. Regularization helps prevent overfitting.

Different kinds of regularization include:

L1 regularization
L2 regularization
dropout regularization

Regularization

L1 vs L2

L1 makes features go to zero. So, goof for feature selection.

L2 doesn't eliminate features as can be seen by the effect of squaring numbers less than 1.

# Dropout

A form of regularization useful in training neural networks.

Dropout regularization works by removing a random selection of a fixed number of the units in a network layer for a single gradient step.

The more units dropped out, the stronger the regularization.

Can be controlled by a parameter

Dropout

# Local Minima

Local Minima

With gradeint descent, we may hit local minima. Just looking from that point, there is no direction that can descend more

# Random Restart

Instead of starting from one point, you try from different points Random Restart

# Momentum

To further tune your model, you can reduce the learning rate.

How do you know how to reduce it ?

The idea is if you get stuck at a local mimima you take the average of previous steps to help you get out of there.

Momentum

← /dl/intro_neural_networks/gradient_descent.html /dl/intro_neural_networks/perceptron.html →