Loss staying the same the whole time during training

SriAnu · October 1, 2020, 1:02pm

I’m using CNNs for age and gender detection. As I’m currently training my model, my loss seems to be staying the same and I’m not sure why. Appreciate any help!

class neural_network(nn.Module):
    def __init__(self):
        super().__init__()
            
        self.conv1 = nn.Conv2d(1, 32, 5)
        self.conv2 = nn.Conv2d(32, 64, 5)
        self.conv3 = nn.Conv2d(64, 128, 5)
#         self.conv4 = nn.Conv2d(128, 256, 3)
        self.flatten = nn.Flatten()
        self.output = nn.Linear(512, 2)

    def forward(self, x):
        x = x.view(-1,1,48,48)
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2,2))
#         x = F.max_pool2d(F.relu(self.conv4(x)), (2,2))
        
        x = self.flatten(x)
        x = self.output(x)
        return F.softmax(x, dim=1)

neural_net = neural_network()

import torch.optim as optim

optimizer = optim.Adam(neural_net.parameters(), lr=1e-4, weight_decay = 5e-5)
loss_function = nn.NLLLoss()

EPOCHS = 3
BATCH_SIZE = 10
y = gender_labels
y = y.type(torch.LongTensor)

correct = 0
predictions = []
correct_labels = []
total = 0

for epoch in range(EPOCHS):
    for i in tqdm(range(0, len(X), BATCH_SIZE)):
        
        optimizer.zero_grad()
        train_X = X[i:i+BATCH_SIZE]
        train_y = y[i:i+BATCH_SIZE]
        output = neural_net(train_X)
        loss = loss_function(output, train_y)
        loss.backward()
        optimizer.step()
        
        with torch.no_grad():
            for idx, i in enumerate(output):
                if torch.argmax(i) == train_y[idx]:
                    correct += 1
                predictions.append(torch.argmax(i).tolist())
                correct_labels.append(train_y[idx].tolist())
                total += 1
    print(loss)

print("Accuracy: ", round(correct/total, 3))

And this is my output:

100%|██████████| 2368/2368 [02:53<00:00, 13.67it/s]
  0%|          | 2/2368 [00:00<02:36, 15.08it/s]
tensor(-1.0000, grad_fn=<NllLossBackward>)
100%|██████████| 2368/2368 [03:08<00:00, 12.54it/s]
  0%|          | 2/2368 [00:00<02:52, 13.74it/s]
tensor(-1.0000, grad_fn=<NllLossBackward>)
100%|██████████| 2368/2368 [03:38<00:00, 10.83it/s]
tensor(-1.0000, grad_fn=<NllLossBackward>)
Accuracy:  0.457

Once again, thanks for the help!

BramVanroy · October 1, 2020, 2:34pm

You first have to initialize a model (as in object oriented programming), and then use it. So use the following instead:

# before the epoch-loop
model = neural_network()
...
# in the loop
output = model(train_X)

SriAnu · October 1, 2020, 2:40pm

oh forgot to show me initializing it. I updated the code. The variable name is neural_net() which is different from the neural_network class. I updated the code my bad on that

BramVanroy · October 1, 2020, 4:14pm

You may need to put your model in training mode in the training loop. (I’m not sure if that’s the default or not.)

neural_net.train()

For evaluation/testing it should be set to evaluation mode.

neural_net.eval()

SriAnu · October 1, 2020, 4:18pm

made no difference, loss is still staying the same

googlebot · October 1, 2020, 4:23pm

As NLLLoss documentation says, it works with LogSoftmax inputs.

SriAnu · October 1, 2020, 4:52pm

so would softmax work with adam then?

BramVanroy · October 1, 2020, 4:53pm

What Alex means is that in your model you should replace soft max with logsoftmax.

SriAnu · October 1, 2020, 4:54pm

oh forgot to mark it as solution, yeah it works but the loss is around 0.8. im thinking that i might need to change the loss function so that loss would decrease. how do you think i can decrease loss and get it close to 0 since the loss is still very high?