Binary classification, outputs converge to alway equal to 1 after certain epochs

I have a binary classification model for images with only 2 labels.

class BinaryClassifcationModel(nn.Module):
    def __init__(self, in_channels=1):
        super(BinaryClassifcationModel, self).__init__()
        # Compute nodes after flatten
        if in_channels == 1:
            n_flat = 9216
        elif in_channels == 3:
            n_flat = 12544
        else:
            raise NotImplementedError

        self.conv1 = nn.Conv2d(in_channels, 32, 3, 1)
        self.relu1 = nn.ReLU()
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.relu2 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(2)
        self.flatten = nn.Flatten(1)
        self.fc1 = nn.Linear(n_flat, 200)
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(200, 1)

    def forward(self, x):
        x = self.relu1(self.conv1(x))
        x = self.relu2(self.conv2(x))
        x = self.pool1(x)
        x = self.flatten(x)
        x = self.relu3(self.fc1(x))
        x = self.fc2(x)
        x = torch.sigmoid(x)
        return x

# Here are the loss and optimization functions.
max_epochs = 100
batch_size = 192
model = BinaryClassifcationModel(in_channels=n_channels).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
loss = nn.BCELoss()
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs)

There are 16000 samples in the training set, and 2000 in the test set.

After few epochs, I noticed the outputs of the model awalys queal to 1.
Resulting a 50% accuracy.

Here are the loss during training:

 1/100[0:00:00.726472] Train Loss: 0.6765 Acc: 59.12%, Test Loss: 0.6608 Acc: 65.80%
 2/100[0:00:00.762435] Train Loss: 0.5176 Acc: 74.36%, Test Loss: 0.5115 Acc: 70.30%
 3/100[0:00:00.720053] Train Loss: 0.4273 Acc: 80.51%, Test Loss: 0.3375 Acc: 89.10%
 4/100[0:00:00.732783] Train Loss: 0.4059 Acc: 82.93%, Test Loss: 0.8257 Acc: 49.05%
 5/100[0:00:00.762209] Train Loss: 0.4373 Acc: 78.19%, Test Loss: 0.1853 Acc: 97.50%
 6/100[0:00:00.745140] Train Loss: 0.4131 Acc: 81.58%, Test Loss: 0.3362 Acc: 89.10%
 7/100[0:00:00.735580] Train Loss: 0.3309 Acc: 86.67%, Test Loss: 0.5465 Acc: 72.70%
 8/100[0:00:00.736106] Train Loss: 0.4639 Acc: 76.42%, Test Loss: 0.6769 Acc: 58.90%
 9/100[0:00:00.729786] Train Loss: 0.5791 Acc: 67.83%, Test Loss: 0.6500 Acc: 59.35%
10/100[0:00:00.736766] Train Loss: 0.4534 Acc: 78.09%, Test Loss: 0.3268 Acc: 87.80%
11/100[0:00:00.745327] Train Loss: 0.5442 Acc: 72.00%, Test Loss: 0.4901 Acc: 79.05%
12/100[0:00:00.774917] Train Loss: 0.4394 Acc: 79.23%, Test Loss: 0.9141 Acc: 53.85%
13/100[0:00:00.731415] Train Loss: 0.4538 Acc: 79.04%, Test Loss: 0.4446 Acc: 82.30%
14/100[0:00:00.754783] Train Loss: 0.3903 Acc: 82.59%, Test Loss: 0.2820 Acc: 91.25%
15/100[0:00:00.729260] Train Loss: 0.3691 Acc: 83.51%, Test Loss: 0.3868 Acc: 85.20%
16/100[0:00:00.739179] Train Loss: 0.3336 Acc: 85.91%, Test Loss: 0.2248 Acc: 95.00%
17/100[0:00:00.740206] Train Loss: 0.3841 Acc: 83.02%, Test Loss: 0.3675 Acc: 84.20%
18/100[0:00:00.755173] Train Loss: 0.3500 Acc: 84.25%, Test Loss: 0.2323 Acc: 90.05%
19/100[0:00:00.733362] Train Loss: 0.7047 Acc: 72.61%, Test Loss: 0.7471 Acc: 55.50%
20/100[0:00:00.756719] Train Loss: 0.6714 Acc: 58.11%, Test Loss: 0.7091 Acc: 53.05%
21/100[0:00:00.732703] Train Loss: 0.6076 Acc: 64.31%, Test Loss: 0.5745 Acc: 70.95%
22/100[0:00:00.735010] Train Loss: 0.6113 Acc: 62.93%, Test Loss: 0.6821 Acc: 61.20%
23/100[0:00:00.733867] Train Loss: 0.6519 Acc: 59.27%, Test Loss: 0.7027 Acc: 48.40%
24/100[0:00:00.730408] Train Loss: 0.6747 Acc: 54.44%, Test Loss: 0.6827 Acc: 54.50%
25/100[0:00:00.734836] Train Loss: 0.6519 Acc: 58.29%, Test Loss: 0.6967 Acc: 48.40%
26/100[0:00:00.730756] Train Loss: 0.6935 Acc: 49.80%, Test Loss: 0.6928 Acc: 51.60%
27/100[0:00:00.732392] Train Loss: 0.6930 Acc: 50.98%, Test Loss: 0.6927 Acc: 51.60%
28/100[0:00:00.768368] Train Loss: 0.6930 Acc: 50.98%, Test Loss: 0.6928 Acc: 51.60%
29/100[0:00:00.733148] Train Loss: 0.6930 Acc: 50.98%, Test Loss: 0.6927 Acc: 51.60%

Any clue what goes wrong?
Thanks.

Training loss should not increase during training. Could you decrease the learning rate? I think that overshooting is happening, especially between epochs 18 and 19. I would also suggest using Adam optimizer instead of SGD. It will converge faster.

Thanks.
I definitely try Adam, and AdamW.
You are right, I do noticed the training loss increases for some epoch, but I though probably it’s due to the mini-batch.

Is it 100 batches or 100 epochs?