Loss is not converging in Pytorch but does in Tensorflow

Epoch: 1 	Training Loss: 0.816370 	Validation Loss: 0.696534
Validation loss decreased (inf --> 0.696534).  Saving model ...
Epoch: 2 	Training Loss: 0.507756 	Validation Loss: 0.594713
Validation loss decreased (0.696534 --> 0.594713).  Saving model ...
Epoch: 3 	Training Loss: 0.216438 	Validation Loss: 1.119294
Epoch: 4 	Training Loss: 0.191799 	Validation Loss: 0.801231
Epoch: 5 	Training Loss: 0.111334 	Validation Loss: 1.753786
Epoch: 6 	Training Loss: 0.064309 	Validation Loss: 1.348847
Epoch: 7 	Training Loss: 0.058158 	Validation Loss: 1.839139
Epoch: 8 	Training Loss: 0.015489 	Validation Loss: 1.370469
Epoch: 9 	Training Loss: 0.082856 	Validation Loss: 1.701200
Epoch: 10 	Training Loss: 0.003859 	Validation Loss: 2.657933
Epoch: 11 	Training Loss: 0.018133 	Validation Loss: 0.593986
Validation loss decreased (0.594713 --> 0.593986).  Saving model ...
Epoch: 12 	Training Loss: 0.160197 	Validation Loss: 1.499911
Epoch: 13 	Training Loss: 0.012942 	Validation Loss: 1.879732
Epoch: 14 	Training Loss: 0.002037 	Validation Loss: 2.399405
Epoch: 15 	Training Loss: 0.035908 	Validation Loss: 1.960887
Epoch: 16 	Training Loss: 0.051137 	Validation Loss: 2.226335
Epoch: 17 	Training Loss: 0.003953 	Validation Loss: 2.619108
Epoch: 18 	Training Loss: 0.000381 	Validation Loss: 2.746541
Epoch: 19 	Training Loss: 0.094646 	Validation Loss: 3.555713
Epoch: 20 	Training Loss: 0.022620 	Validation Loss: 2.833098
Epoch: 21 	Training Loss: 0.004800 	Validation Loss: 4.181845
Epoch: 22 	Training Loss: 0.014128 	Validation Loss: 1.933705
Epoch: 23 	Training Loss: 0.026109 	Validation Loss: 2.888344
Epoch: 24 	Training Loss: 0.000768 	Validation Loss: 3.029443
Epoch: 25 	Training Loss: 0.000327 	Validation Loss: 3.079959
Epoch: 26 	Training Loss: 0.000121 	Validation Loss: 3.578420
Epoch: 27 	Training Loss: 0.148478 	Validation Loss: 3.297387
Epoch: 28 	Training Loss: 0.030328 	Validation Loss: 2.218535
Epoch: 29 	Training Loss: 0.001673 	Validation Loss: 2.934132
Epoch: 30 	Training Loss: 0.000253 	Validation Loss: 3.215722

My loss is not converging. I am working on Horses vs humans dataset. There is an official notebook in tensorflow for that and it worked like a charm. When I am trying to replicate the same with pytorch, loss is not converging. Can you please have a look?

I am using criterion = nn.BCEWithLogitsLoss() and optimizer = optim.RMSprop(model.parameters(), lr=0.001). Although it seems to have some effect on Training Loss, but Validation losses look like random numbers and not forming any pattern. What could be the possible reasons for loss not converging?

This is my CNN architecture:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # convolutional layer (sees 298x298x3 image tensor)
        self.conv1 = nn.Conv2d(3, 16, 3)
        # convolutional layer (sees 147x147x16 tensor)
        self.conv2 = nn.Conv2d(16, 32, 3)
        # convolutional layer (sees 71x71x32 tensor)
        self.conv3 = nn.Conv2d(32, 64, 3)
        # convolutional layer (sees 33x33x64 tensor)
        self.conv4 = nn.Conv2d(64, 64, 3)
        # convolutional layer (sees 14x14x64 tensor)
        self.conv5 = nn.Conv2d(64, 64, 3)
        # max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        # linear layer (64 * 7 * 7 -> 500)
        self.fc1 = nn.Linear(3136, 512)
        # linear layer (512 -> 1)
        self.fc2 = nn.Linear(512, 1)
        # dropout layer (p=0.25)
        self.dropout = nn.Dropout(0.25)

    def forward(self, x):
        # add sequence of convolutional and max pooling layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = self.pool(F.relu(self.conv4(x)))
        x = self.pool(F.relu(self.conv5(x)))

        # flatten image input
        x = x.view(-1, 64 * 7 * 7)
        # add dropout layer
        x = self.dropout(x)
        # add 1st hidden layer, with relu activation function
        x = F.relu(self.fc1(x))
        # add dropout layer
        x = self.dropout(x)
        # add 2nd hidden layer
        x = self.fc2(x)
        return x

This is the complete jupyter notebook. Apologies for not being able to create a minimal reproduce-able example code.

In the official notebook, the last fully connected layer has an sigmoid activation function. self.fc2 doesn’t have any. And because the output isn’t mapped between 0 and 1 so maybe your gradients are different.

@urw7rs I have used BCEWithLogitsLoss, which comes built-in with sigmoid activation function. So it’s behaviour should stay the same.
Besides, I also have tried replacing the above loss function with nn. BCELoss and applying sigmoid activation function at last layer, as exactly what you said, but still loss was not converging.

Have you solve the problem?