How To Deal With Overfitting In Deep Learning Models

Say you are eating an apple for the first time and it turns out to be rotten. The first thought that comes to your mind is "apples are rotten". While this is not an accurate assumption, it is human nature to overgeneralize things due to the lack of a variety of experiences. When the same thing happens with a machine learning model, we call it "overfitting".

A machine learning or deep learning model is said to be overfitted if it produces a high training accuracy but a low out-of-sample accuracy.

Overfitting happens when the model is too complex relative to the amount and noisiness of the training data.

An overfitted model becomes too accustomed to the training data that it fails to perform well on any data that is not similar or not a part of the training set.

As a beginner in the field of Deep Learning, overfitting was one of the most troublesome things to deal with. So, I made of list of all the techniques I've used to deal with overfitting to make it a little easier for those going down the same road.

Regularization:

Constraining a model to make it simpler and reduce the risk of overfitting is called regularization.

Deep neural networks usually have tens of thousands of parameters, sometimes even millions. With so many parameters, the network can fit a huge variety of large datasets. But this great flexibility also means that it is prone to overfitting the training set. We need regularization.

The most common techniques of regularization are:ℓ1 and ℓ2 regularization, dropout and max-norm regularization.

ℓ1 and ℓ2 Regularization:

ℓ1 and ℓ2 regularization can be used to constrain a neural network's connection weights.

The Keras library has great implementation functions for these techniques. To learn more about ℓ1 and ℓ2 regularization, visit ℓ1 and ℓ2 Regularization.

Using Dropouts:

When a deep neural network has many neurons in a single layer, the neighbouring neurons may have similar weights and can skew the final training. Using Dropout layers can greatly help deal with this issue.

It is a fairly simple algorithm: at every training step, every neuron (including the input neurons, but always excluding the output neurons) has a probability p of being temporarily “dropped out,” meaning it will be entirely ignored during this training step, but it may be active during the next step. After training, neurons don’t get dropped anymore. Hence, it helps avoid layers with approximately the same weights clashing with each other.

To learn more about using dropouts for regularization, watch this video : Dropouts for Regularization.

Max-Norm Regularization:

Max-Norm Regularization is another popular regularization technique. It imposes an upper bound weight constraint for the weights in the neural network. This constraint forces all the weights in the DNN to be smaller than a certain value and hence prevent the model from overfitting.

To learn more about weight constraints and max-norm regularization, visit this page: Weight Constraints and Max-Norm Regularization.

Some other techniques to avoid overfitting are:

Adjusting the Learning Rate:

Learning rate is a hyperparameter that determines the step size at each iteration. Decreasing the learning rate can slowly the training down but has proven to be immensely helpful to regularize a deep learning model.

Using a Different Optimizer or Loss Metric:

Trying different optimizers and/or loss metrics to find the best fit for your DNN can not only avoid overfitting but also improve the model accuracy greatly.

Using Image Augmentation:

One of the many reasons for overfitting is the insufficiency of diverse training data. Image Augmentation essentially "generates" more training data by augmenting/skewing the existing images to create a wider variety of training data.

Above were all the regularization techniques that have worked with my Deep Learning models so far. But, they are just the tip of the iceberg. If these don't work for you, there are numerous more techniques and algorithms that you can explore.

At the end of the day, it is about finding what fits your model best.

Search This Blog

Deep Learning