CrossEntropyLoss is not going down

mlearner · February 6, 2020, 11:45am

Hello,
I am training the model written below, but the Cross Entropy Loss is not decreasing (it oscillates close to the initial value), even increasing the learning rate.

I have already searched for related topics in the forum, but no one is solving my problem.
The model seems pretty straightforward and I cannot detect any mistakes by myself.
If someone can spot something unusual, it would be very helpful.

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        #1st
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(p=0.25)
        )
        #2nd
        self.conv2 = nn.Sequential(
            nn.Conv2d(64, 64, 3, padding=1),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(p=0.25)
        )
        #3rd
        self.conv3 = nn.Sequential(
            nn.Conv2d(64, 64, 3, padding=1),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(p=0.25)
        )
        #4th
        self.fc1 = nn.Sequential(
            nn.Linear(64*8*8, 128),
            nn.Dropout(p=0.5)
        )
        #5th
        self.fc2 = nn.Sequential(
            nn.Linear(128, 128),
            nn.Dropout(p=0.5)
        )
        #6th
        self.fc3 = nn.Sequential(
            nn.Linear(128, 22),
        )
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x= x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

ptrblck · February 7, 2020, 1:06am

I would recommend the opposite and try to decrease the learning rate.
Also, try to overfit a small data sample and make sure your model is able to do so.

mlearner · February 7, 2020, 7:05am

After hours and hours I found out that the problem is the input normalization (of the dataset): I use the normalization the with mean and std calculated over my specific dataset.
This leads to non-decreasing loss and producing always the same output (prediction) no matter the input.
If I use the standard [0.5, 0.5, 0.5] everything goes fine.
Do you have any explanation? I am a little confused…

albanD · February 7, 2020, 4:33pm

You might want to plot some histograms of values in your dataset. That might shed some light on this.
Most likely that the normalization was destroying useful features (or squashing them too much and the network was not able to differentiate anymore).