Why is the validation loss increasing?

Hi everyone,

completely new to DL/Pytorch here so feel free to treat me like a nube. I have been training an image dataset of 3 category animals (Cats, Dogs and Pandas) on a very simple CNN architecture. The structure goes like below:

from torch import nn

class ShallowNetTorch(nn.Module):
    def __init__(self, width, height, depth, classes):
        super(ShallowNetTorch, self).__init__()

        self.width = width
        self.height = height

        # first and only conv
        self.conv1 = nn.Conv2d(in_channels=depth, out_channels=32,
                               kernel_size=3, stride=1, padding=1)
        # Relu Activation
        self.activation = nn.ReLU()

        # linear layer (32*32*32 -> classes)
        self.fc1 = nn.Linear(self.width * self.height * 32, classes)

    def forward(self, x):
        # add sequence of convolutions
        x = self.activation(self.conv1(x))

        # flatten the activations
        x = x.view(-1, self.width * self.height * 32)

        # pass thru last activations into 3 classes
        x = self.fc1(x)

        return x

After training the model for 100 epochs, I am plotting the training/validation losses and accuracies. While my training loss decreases (training accuracy + validation accuracy increases) over the epochs as expected, the validation loss keeps on having an increasing trend over the 100 epochs. Please see the image below:

As can be seen from the training and validation curves, its quite obvious I am overfitting. The validation loss however, keeps on increasing and I am not sure why is this happening. My expectation was, since the validation accuracy is effectively settling at a value, the validation loss should also be doing that at higher epochs. Am I calculating the validation loss correctly? I am attaching a link of the jupyter notebook here where I trained the ShallowNet model and calculated all the losses and accuracies that have been later plotted in the image above. [LINK]

1 Like

The calculation looks alright, even though you could use torch.argmax(logits, 1), since the predicted class will be the same as with the probabilities using softmax. :wink:
One possible reason for the increasing validation loss would be that the model increases its logits for the wrong classes.

1 Like

@ptrblck Thanks for the answer! so to put it simply, can it be said that due to the intense overfitting, the model is now classifying the correctly classified images less better than before (e.g assigning them a lesser softmax probability than before). This while keeping the accuracy relatively constant, increases the loss. Will that be the correct interpretation of what you are saying?

1 Like

This might be the case or the model is classifying the wrong classes with a higher confidence.