Loss becoming 0 too early

gergui36 · September 25, 2018, 8:13am

Hi, I have a training set of 70 classes and 40 images/class (2800 in total), and a testing set of 350 in total.
What happens is that the loss becomes 0 when testing accuracy is still 58 %, and everything remains constant from this point. I’m using batchsize=5, learningrate=0.001, momentum=0.9. I’ve tried changing the three parameters but results get worst (loss becoming 0 with 30% accuracy, or loss never decreasing). How can I solve this? Just trying other values for this parameters?

Thank you!

[1, 560] loss: 4.250
Accuracy of the network on the test images: 2 %
[2, 560] loss: 4.210
Accuracy of the network on the test images: 2 %
[3, 560] loss: 3.903
Accuracy of the network on the test images: 5 %
[4, 560] loss: 3.469
Accuracy of the network on the test images: 15 %
[5, 560] loss: 2.995
Accuracy of the network on the test images: 20 %
[6, 560] loss: 2.351
Accuracy of the network on the test images: 25 %
[7, 560] loss: 1.795
Accuracy of the network on the test images: 40 %
[8, 560] loss: 1.247
Accuracy of the network on the test images: 40 %
[9, 560] loss: 0.865
Accuracy of the network on the test images: 44 %
[10, 560] loss: 0.572
Accuracy of the network on the test images: 45 %
[11, 560] loss: 0.376
Accuracy of the network on the test images: 46 %
[12, 560] loss: 0.279
Accuracy of the network on the test images: 46 %
[13, 560] loss: 0.163
Accuracy of the network on the test images: 44 %
[14, 560] loss: 0.151
Accuracy of the network on the test images: 46 %
[15, 560] loss: 0.107
Accuracy of the network on the test images: 54 %
[16, 560] loss: 0.015
Accuracy of the network on the test images: 58 %
[17, 560] loss: 0.001
Accuracy of the network on the test images: 58 %
[18, 560] loss: 0.000
Accuracy of the network on the test images: 58 %
[19, 560] loss: 0.000
Accuracy of the network on the test images: 59 %
[20, 560] loss: 0.000
Accuracy of the network on the test images: 59 %
[21, 560] loss: 0.000
Accuracy of the network on the test images: 59 %
[22, 560] loss: 0.000
Accuracy of the network on the test images: 59 %
[23, 560] loss: 0.000
Accuracy of the network on the test images: 59 %

ptrblck · September 25, 2018, 10:18am

Your dataset is quite small. Are you training your model from scratch? If so, could you try to use a pretrained model and fine tune it?
Since your data is so small you might need a lot of regularization, e.g. weight decay.

gergui36 · September 25, 2018, 10:36am

Thanks for your answer. Yes, I’m training from scratch, using the following net definition (inputs are faces of size 96x96x3):

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.pool = nn.MaxPool2d(2, 2)
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 21 * 21, 512)
self.fc2 = nn.Linear(512, 128)
self.fc3 = nn.Linear(128, 70)

def forward(self, x):
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))
    
    x = x.view(-1, 16 * 21 * 21)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

The problem is that when training accuracy is 100%, testing accuracy is just 59%.
Modifying the net layers (increasing or decreasing number of parameters) could improve testing accuracy, or just regularization or retraining work in this case?

Ditlev_Jorgensen · September 25, 2018, 1:30pm

It looks like your model is overfitting.

There are not many solutions to this other than adding regularization such as weight decay, dropout layers, etc or increase your training data.

It is a hot subject within ML and data science and thus there are many blogs and articles on the matter.

I would start with adding a nn.dropout() and if that doesn’t improve much try using a bathnormalization layer as this layer also can works as a regulator. (nn.BathNorm2d()).