Loss decreasing but accuracy is still the same

bing · October 1, 2020, 6:25pm

Hi Guys,

I am trying to do multi-class image classification.
I am trying to debug my network for potential bugs so training and validating are on the same subset of data. Logically, the training and validation loss should decrease and then saturate which is happening but also, it should give 100% or a very large accuracy on the valid set( As it is same as of training set), but it is giving 0% accuracy.
So I am wondering whether my calculation of accuracy is correct or not? Below is my code, could you please check and point me if there are any bugs which I am not able to figure it out.

    model.eval()
    for batch_idx, (data, target) in enumerate(loader['valid']):
        # move to GPU
        if torch.cuda.is_available():
            data, target = data.cuda(), target.cuda()
        # update the average validation loss
        output = model(data).squeeze()
        output = torch.unsqueeze(output, 0)
        loss = criterion(output, target)   # changes
        valid_loss += ((1 / (batch_idx + 1)) * ((loss.data) - valid_loss))
        pred = output.data.max(1, keepdim=True)[0]
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred)).cpu().numpy()))
        total += data.size(0)
    accuracy = 100. * (correct/total)
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f} \t Validation Accuracy: {:.6f}'.format(
        epoch,
        train_loss,
        valid_loss,
        accuracy
        ))

Caruso · October 1, 2020, 6:51pm

Hi @bing,

When using pred = output.data.max(1, keepdim=True)[0] the 0 index is the maximum value and the first index the argmax. You probably want to compare the argmax of your predictions with your target labels, so pred = output.data.max(1, keepdim=True)[1] should work.

bing · October 1, 2020, 7:13pm

Thanks @Caruso for answering.
It works but there is a blind spot in my understanding which I am able to catch slightly but I want to clear it up more.
Suppose model will predict the following probabilities - 0.14, 1.2, -0.3
how the index’s are mapped with the labels, like how do we know that the first index is mapping to the first label and similarly the other indexes too

KFrank · October 2, 2020, 4:51pm

Hi Bing!

Just to be precise, these are not “probabilities.” (If they were, they
would be values between 0 and 1 that sum to 1.) They are most
likely the output of a final Linear layer (with no subsequent non-linear
“activation”), and should be understood as raw-score logits.

The output of a model means whatever you train it to mean.

If your target values are “integer categorical labels,” and your
criterion (loss function) is CrossEntropyLoss, then you will
be training your model so that the[0] element of its output maps
to the class whose integer class label is 0, element [1] maps to
class label 1, and so on.

To reiterate, the relationship between the output of your model and
the class labels you use to train your model depends entirely on what
you train your model to do.

Best.

K. Frank

bing · October 3, 2020, 3:51pm

Thank You KFrank.
That resolved my doubt and was really helpful.
I have one more thing - In this case, I have integer categorical labels but they are starting from 1 not 0, so do I need to convert them to start from 0 to get them mapped to the [0] element.

Caruso · October 3, 2020, 4:17pm

Hi @bing,

sorry for not answering. If you look at the documentation for CrossEntropyLoss, it says

This criterion expects a class index in the range [0,C−1] as the target for each value of a 1D tensor of size minibatch.

Greetings.