Hello, loss is marked as zero

epoch: 25 Loss: 0.0 %
output : tensor([[-21.6479, -24.8251, 46.5167],
[ -7.8592, -16.6111, 24.4777]], device=‘cuda:0’)
target : tensor([2, 2], device=‘cuda:0’)

In terms of value, the answer is correct.
Because it is correct, loss is marked as zero.
So learning is not going on.

thank you, everything

Can you explain what the question is?

Roy

The shape of output and target isn’t the same. This shouldn’t be the expected result, right?

If not, how do you define your comparison of output and target to calculate loss? Are you defining a custom loss function for this?

My source is to classify three classes. I use this.

criterion = nn.CrossEntropyLoss().to(device)

optimizer_ft= optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9)


loss = criterion(outputs, target)

output: tensor ([[-21.6479, -24.8251, 46.5167],
[-7.8592, -16.6111, 24.4777]], device = ‘cuda : 0’)

target: tensor ([2, 2], device = ‘cuda : 0’)

I don’t know how to compare output and target.

My source is to classify three classes. I use this.

criterion = nn.CrossEntropyLoss().to(device)

optimizer_ft= optim.SGD(model_ft.parameters(), lr=0.01, momentum=0.9)


loss = criterion(outputs, target)

output: tensor ([[-21.6479, -24.8251, 46.5167],
[-7.8592, -16.6111, 24.4777]], device = ‘cuda : 0’)

target: tensor ([2, 2], device = ‘cuda : 0’)

I don’t know how to compare output and target.

Your loss is fine, so im still not sure what is the question here, are you asking how you check the accuracy?

loss = criterion(outputs, target)
Is correct, CrossEntropyLoss applies LogSoftmax on your logits and then uses NLLoss.
your output tensor looks like Logits, and you have 3 classes, and both samples in outputs marked class 2 as the chosen class, by a large margin from the other classes.

Your target says class 2 for the 1st sample and class 2 for the second sample,
so in that case, loss=0.0 (which is perfectly fine in this example)

so when you say:

So learning is not going on.

It seems strange, cause it looks like learning was done, and it was very good (maybe even overfitting :man_shrugging:t2:).

Just in case your question was how to check the accuracy of the model, then you probably want to use
CategoricalAccuracyWithLogits, lpd package has this metric and some more, so you may choose to use that if you want structured train process and metrics,

For the sake of simplicity, you can use this snippet:

def categorical_accuracy_with_logits(y_pred, y_true):
    indices = torch.max(y_pred, 1)[1]
    correct = torch.eq(indices, y_true).view(-1)
    accuracy = correct.float().sum() / correct.shape[0]
    return accuracy

Roy

1 Like

thanks RoySadaKa

I checked ACC but always 1.
Loss is repetition. Like this

Can you show the same image but for epochs 0,1,2 ?

Roy

1 Like

Thanks


thank you :slight_smile: im not sure i fully understand the log, but it seems like the loss > 0 in the first few epochs, and reaching 0 at epoch 1408, so can you explain what are your concerns?

Also, can you explain the challenge you try to solve? i see the input shape is
(8, 3, 40, 128, 128), please provide some context about the input, I’m guessing the image sizes are 128x128, there are 3 channels, your batch size = 8, not sure what is the 40?

Knowing what you try to do can help a lot, even share some code.

Roy

1 Like

Thanks, for Answer
I try to make 3dCNN for Action recognition
(8, 3, 40, 128, 128) => (batch, channel, frame_len, height, width)
put in all at once Image * frame_len , and get one label (ex… fight,walk,stand…)

This is my code (train)

ave_loss =0
epoch_loss = 0
ave_loss =0
for epoch in range(nb_epoch):
print(“ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ”,epoch)
running_loss = 0.0

epoch_num = 1
for i, data in enumerate(train_loader, 0):
    model_ft.train()
  
    inputs, labels = data
    inputs = inputs.to(device)
    
    
    labels = torch.argmax(labels, dim=1)
    
    labels = labels.to(device)

  
    optimizer_ft.zero_grad()

    print("input shape : ",inputs.permute(0,4,1,2,3).shape)
    outputs = model_ft(inputs.permute(0,4,1,2,3))
    
    
    _, predicted = torch.max(outputs.data, 1)
   
 
    loss = criterion(outputs, labels)
    print(i)
    print('epoch: {}  Loss: {} %'.format(epoch, loss.data))
  
    loss.backward()
    optimizer_ft.step()

    
    running_loss += loss.item()
    epoch_loss = running_loss/(i+1)
 
 
    print("epoch_loss : ",epoch_loss)
    print("ACC : ",categorical_accuracy_with_logits(outputs,labels))


    value_tracker(loss_plt, torch.Tensor([epoch_loss]), torch.Tensor([i + epoch*len(train_loader) ]))
    epoch_num+=1


running_loss = 0.0
if epoch%5==0:
    acc = acc_check(model_ft, test_loader, epoch, save=1)
    value_tracker(acc_plt, torch.Tensor([acc]), torch.Tensor([epoch]))
    
if epoch%20==0:
    torch.save(model_ft, "D:/data/3dcnn_weights/basic128n_epoch_{}_acc_{}.pt".format(epoch, int(acc)))

print(‘Finished Training’)

I used the accuracy measurement method that I told you, but it always comes out as 1.
But the val accuracy is low.

thank you so much

using the data you provided in the initial post, the accuracy behaves as expected:

In  [1]: import torch
In  [2]: a = torch.tensor([[-21.6479, -24.8251, 46.5167],[-7.8592, -16.6111, 24.4777]])
In  [3]: a
Out [3]: tensor([[-21.6479, -24.8251,  46.5167],
                 [ -7.8592, -16.6111,  24.4777]])

In  [4]: b = torch.tensor([2, 2])
In  [5]: categorical_accuracy_with_logits(a,b)
Out [5]: tensor(1.)

In  [6]: b = torch.tensor([2, 1])
In  [7]: categorical_accuracy_with_logits(a,b)
Out [7]: tensor(0.5000)

In  [8]: b = torch.tensor([0, 0])
In  [9]: categorical_accuracy_with_logits(a,b)
Out [9]: tensor(0.)

Maybe there is overfitting happening on the trainset (you showed 1408 epochs maybe it’s too much), have you considered using EarlyStopping? (to stop the train if some metric is not improving for N amount of epoch, e.g. loss not decreasing anymore, or accuracy not increasing)

Roy