Loss remains constant/unchanged

I was recently practicing building an image classification model and the training loss is not changing. Can someone guide me on what am I doing wrong ?

I am using the dataset available on Kaggle here.
Model Definition:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size = (3, 3))
        self.conv2 = nn.Conv2d(16, 64, kernel_size = (3, 3))
        self.conv3 = nn.Conv2d(64, 256, kernel_size = (3, 3))
        self.maxpool = nn.MaxPool2d(2, stride = 3)
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(4*4*256, 256)
        self.fc2 = nn.Linear(256, 32)
        self.fc3 = nn.Linear(32, 10)
            
    def forward(self, image):
        image = F.relu(self.conv1(image))
        image = self.maxpool(image)
        image = F.relu(self.conv2(image))
        image = self.maxpool(image)
        image = F.relu(self.conv3(image))
        image = self.maxpool(image)
        
        image = self.flatten(image)
        
        image = F.relu(self.fc1(image))
        image = F.relu(self.fc2(image))
        image = image.reshape(image.shape[0], -1)
        image = F.relu(self.fc3(image))
        
        return image
        
model = Model()

And the training script:

device = 'cpu'
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
criterion = nn.CrossEntropyLoss()

epochs = 5
training_loss = []
accuracy = []
thresh = 50
iters = 0
total_loss = 0
for e in range(epochs):
    for sample, label in image_loader:
        sample, label = sample.to(device), label.to(device)
        optimizer.zero_grad()
        output = model(sample)
        loss = criterion(output, label)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
        if iters%thresh == 0:
            pred = torch.argmax(output, dim = 1)
            correct = pred.eq(label)
            acc = torch.mean(correct.float())
            print('[Epoch {}/{}] Iteration {} -> Train Loss: {:.4f}, Accuracy: {:.3f}'.format(e+1, epochs, iters, loss/thresh, acc))
            training_loss.append(loss)
            accuracy.append(acc)
            total_loss = 0
        iters += 1
plt.plot(loss_list, label='loss')
plt.plot(acc_list, label='accuracy')
plt.legend()
plt.title('training loss and accuracy')
plt.show()

Below is the loss and accuracy while training:

[Epoch 1/5] Iteration 0 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 1/5] Iteration 50 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 1/5] Iteration 100 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 1/5] Iteration 150 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 1/5] Iteration 200 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 1/5] Iteration 250 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 1/5] Iteration 300 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 1/5] Iteration 350 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 1/5] Iteration 400 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 1/5] Iteration 450 -> Train Loss: 0.0402, Accuracy: 0.000
[Epoch 1/5] Iteration 500 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 1/5] Iteration 550 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 1/5] Iteration 600 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 1/5] Iteration 650 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 1/5] Iteration 700 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 1/5] Iteration 750 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 1/5] Iteration 800 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 1/5] Iteration 850 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 2/5] Iteration 900 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 2/5] Iteration 950 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 2/5] Iteration 1000 -> Train Loss: 0.0404, Accuracy: 0.062
[Epoch 2/5] Iteration 1050 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 2/5] Iteration 1100 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 2/5] Iteration 1150 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 2/5] Iteration 1200 -> Train Loss: 0.0402, Accuracy: 0.000
[Epoch 2/5] Iteration 1250 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 2/5] Iteration 1300 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 2/5] Iteration 1350 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 2/5] Iteration 1400 -> Train Loss: 0.0402, Accuracy: 0.250
[Epoch 2/5] Iteration 1450 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 2/5] Iteration 1500 -> Train Loss: 0.0402, Accuracy: 0.438
[Epoch 2/5] Iteration 1550 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 2/5] Iteration 1600 -> Train Loss: 0.0402, Accuracy: 0.250
[Epoch 2/5] Iteration 1650 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 2/5] Iteration 1700 -> Train Loss: 0.0402, Accuracy: 0.375
[Epoch 2/5] Iteration 1750 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 3/5] Iteration 1800 -> Train Loss: 0.0402, Accuracy: 0.250
[Epoch 3/5] Iteration 1850 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 3/5] Iteration 1900 -> Train Loss: 0.0402, Accuracy: 0.250
[Epoch 3/5] Iteration 1950 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 3/5] Iteration 2000 -> Train Loss: 0.0402, Accuracy: 0.250
[Epoch 3/5] Iteration 2050 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 3/5] Iteration 2100 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 3/5] Iteration 2150 -> Train Loss: 0.0402, Accuracy: 0.250
[Epoch 3/5] Iteration 2200 -> Train Loss: 0.0402, Accuracy: 0.375
[Epoch 3/5] Iteration 2250 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 3/5] Iteration 2300 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 3/5] Iteration 2350 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 3/5] Iteration 2400 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 3/5] Iteration 2450 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 3/5] Iteration 2500 -> Train Loss: 0.0402, Accuracy: 0.250
[Epoch 3/5] Iteration 2550 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 3/5] Iteration 2600 -> Train Loss: 0.0405, Accuracy: 0.125
[Epoch 4/5] Iteration 2650 -> Train Loss: 0.0402, Accuracy: 0.375
[Epoch 4/5] Iteration 2700 -> Train Loss: 0.0402, Accuracy: 0.000
[Epoch 4/5] Iteration 2750 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 4/5] Iteration 2800 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 4/5] Iteration 2850 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 4/5] Iteration 2900 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 4/5] Iteration 2950 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 4/5] Iteration 3000 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 4/5] Iteration 3050 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 4/5] Iteration 3100 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 4/5] Iteration 3150 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 4/5] Iteration 3200 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 4/5] Iteration 3250 -> Train Loss: 0.0402, Accuracy: 0.125
[Epoch 4/5] Iteration 3300 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 4/5] Iteration 3350 -> Train Loss: 0.0402, Accuracy: 0.250
[Epoch 4/5] Iteration 3400 -> Train Loss: 0.0402, Accuracy: 0.312
[Epoch 4/5] Iteration 3450 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 4/5] Iteration 3500 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 5/5] Iteration 3550 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 5/5] Iteration 3600 -> Train Loss: 0.0402, Accuracy: 0.062
[Epoch 5/5] Iteration 3650 -> Train Loss: 0.0402, Accuracy: 0.188
[Epoch 5/5] Iteration 3700 -> Train Loss: 0.0402, Accuracy: 0.062

Remove the last F.relu so that your model is able to return negative and positive logits and rerun the script.
If that doesn’t help, try to overfit a small dataset (e.g. just 10 samples) by playing around with the hyperparameters.

1 Like

I tried this, but didn’t help. When I removed the F.relu from the last layer, the loss deviated a bit, but it went just ± 0.002, and the accuracy also didn’t improve.

I also added a large number of convolutional layers to try overfitting the model, but it still remains the same.

The model itself seems to be working and is able to overfit a small data sample perfectly:

model = Model()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
data = torch.randn(10, 3, 140, 140)
target = torch.randint(0, 10, (10,))
criterion = nn.CrossEntropyLoss()

for epoch in range(100):
    optimizer.zero_grad()
    out = model(data)
    loss = criterion(out, target)
    loss.backward()
    optimizer.step()
    print('epoch {}, loss {}'.format(epoch, loss.item()))

model.eval()
pred = model(data)
pred = torch.argmax(pred, dim=1)
acc = (pred == target).float().mean()
print(acc)
> tensor(1.)

The model is correctly overfitting, that means there has to be issues with the training loop