I am trying to do multi-class image classification.
I am trying to debug my network for potential bugs so training and validating are on the same subset of data. Logically, the training and validation loss should decrease and then saturate which is happening but also, it should give 100% or a very large accuracy on the valid set( As it is same as of training set), but it is giving 0% accuracy.
So I am wondering whether my calculation of accuracy is correct or not? Below is my code, could you please check and point me if there are any bugs which I am not able to figure it out.
model.eval()
for batch_idx, (data, target) in enumerate(loader['valid']):
# move to GPU
if torch.cuda.is_available():
data, target = data.cuda(), target.cuda()
# update the average validation loss
output = model(data).squeeze()
output = torch.unsqueeze(output, 0)
loss = criterion(output, target) # changes
valid_loss += ((1 / (batch_idx + 1)) * ((loss.data) - valid_loss))
pred = output.data.max(1, keepdim=True)[0]
correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred)).cpu().numpy()))
total += data.size(0)
accuracy = 100. * (correct/total)
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f} \t Validation Accuracy: {:.6f}'.format(
epoch,
train_loss,
valid_loss,
accuracy
))
When using pred = output.data.max(1, keepdim=True)[0] the 0 index is the maximum value and the first index the argmax. You probably want to compare the argmax of your predictions with your target labels, so pred = output.data.max(1, keepdim=True)[1] should work.
Thanks @Caruso for answering.
It works but there is a blind spot in my understanding which I am able to catch slightly but I want to clear it up more.
Suppose model will predict the following probabilities - 0.14, 1.2, -0.3
how the index’s are mapped with the labels, like how do we know that the first index is mapping to the first label and similarly the other indexes too
Just to be precise, these are not “probabilities.” (If they were, they
would be values between 0 and 1 that sum to 1.) They are most
likely the output of a final Linear layer (with no subsequent non-linear
“activation”), and should be understood as raw-score logits.
The output of a model means whatever you train it to mean.
If your target values are “integer categorical labels,” and your criterion (loss function) is CrossEntropyLoss, then you will
be training your model so that the[0] element of its output maps
to the class whose integer class label is 0, element [1] maps to
class label 1, and so on.
To reiterate, the relationship between the output of your model and
the class labels you use to train your model depends entirely on what
you train your model to do.
Thank You KFrank.
That resolved my doubt and was really helpful.
I have one more thing - In this case, I have integer categorical labels but they are starting from 1 not 0, so do I need to convert them to start from 0 to get them mapped to the [0] element.