Epoch: 1 Training Loss: 0.816370 Validation Loss: 0.696534
Validation loss decreased (inf --> 0.696534). Saving model ...
Epoch: 2 Training Loss: 0.507756 Validation Loss: 0.594713
Validation loss decreased (0.696534 --> 0.594713). Saving model ...
Epoch: 3 Training Loss: 0.216438 Validation Loss: 1.119294
Epoch: 4 Training Loss: 0.191799 Validation Loss: 0.801231
Epoch: 5 Training Loss: 0.111334 Validation Loss: 1.753786
Epoch: 6 Training Loss: 0.064309 Validation Loss: 1.348847
Epoch: 7 Training Loss: 0.058158 Validation Loss: 1.839139
Epoch: 8 Training Loss: 0.015489 Validation Loss: 1.370469
Epoch: 9 Training Loss: 0.082856 Validation Loss: 1.701200
Epoch: 10 Training Loss: 0.003859 Validation Loss: 2.657933
Epoch: 11 Training Loss: 0.018133 Validation Loss: 0.593986
Validation loss decreased (0.594713 --> 0.593986). Saving model ...
Epoch: 12 Training Loss: 0.160197 Validation Loss: 1.499911
Epoch: 13 Training Loss: 0.012942 Validation Loss: 1.879732
Epoch: 14 Training Loss: 0.002037 Validation Loss: 2.399405
Epoch: 15 Training Loss: 0.035908 Validation Loss: 1.960887
Epoch: 16 Training Loss: 0.051137 Validation Loss: 2.226335
Epoch: 17 Training Loss: 0.003953 Validation Loss: 2.619108
Epoch: 18 Training Loss: 0.000381 Validation Loss: 2.746541
Epoch: 19 Training Loss: 0.094646 Validation Loss: 3.555713
Epoch: 20 Training Loss: 0.022620 Validation Loss: 2.833098
Epoch: 21 Training Loss: 0.004800 Validation Loss: 4.181845
Epoch: 22 Training Loss: 0.014128 Validation Loss: 1.933705
Epoch: 23 Training Loss: 0.026109 Validation Loss: 2.888344
Epoch: 24 Training Loss: 0.000768 Validation Loss: 3.029443
Epoch: 25 Training Loss: 0.000327 Validation Loss: 3.079959
Epoch: 26 Training Loss: 0.000121 Validation Loss: 3.578420
Epoch: 27 Training Loss: 0.148478 Validation Loss: 3.297387
Epoch: 28 Training Loss: 0.030328 Validation Loss: 2.218535
Epoch: 29 Training Loss: 0.001673 Validation Loss: 2.934132
Epoch: 30 Training Loss: 0.000253 Validation Loss: 3.215722
My loss is not converging. I am working on Horses vs humans dataset. There is an official notebook in tensorflow for that and it worked like a charm. When I am trying to replicate the same with pytorch, loss is not converging. Can you please have a look?
I am using criterion = nn.BCEWithLogitsLoss()
and optimizer = optim.RMSprop(model.parameters(), lr=0.001)
. Although it seems to have some effect on Training Loss, but Validation losses look like random numbers and not forming any pattern. What could be the possible reasons for loss not converging?
This is my CNN architecture:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# convolutional layer (sees 298x298x3 image tensor)
self.conv1 = nn.Conv2d(3, 16, 3)
# convolutional layer (sees 147x147x16 tensor)
self.conv2 = nn.Conv2d(16, 32, 3)
# convolutional layer (sees 71x71x32 tensor)
self.conv3 = nn.Conv2d(32, 64, 3)
# convolutional layer (sees 33x33x64 tensor)
self.conv4 = nn.Conv2d(64, 64, 3)
# convolutional layer (sees 14x14x64 tensor)
self.conv5 = nn.Conv2d(64, 64, 3)
# max pooling layer
self.pool = nn.MaxPool2d(2, 2)
# linear layer (64 * 7 * 7 -> 500)
self.fc1 = nn.Linear(3136, 512)
# linear layer (512 -> 1)
self.fc2 = nn.Linear(512, 1)
# dropout layer (p=0.25)
self.dropout = nn.Dropout(0.25)
def forward(self, x):
# add sequence of convolutional and max pooling layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.pool(F.relu(self.conv4(x)))
x = self.pool(F.relu(self.conv5(x)))
# flatten image input
x = x.view(-1, 64 * 7 * 7)
# add dropout layer
x = self.dropout(x)
# add 1st hidden layer, with relu activation function
x = F.relu(self.fc1(x))
# add dropout layer
x = self.dropout(x)
# add 2nd hidden layer
x = self.fc2(x)
return x
This is the complete jupyter notebook. Apologies for not being able to create a minimal reproduce-able example code.