Trouble getting probability from softmax

I am trying to get a confidence from a model after giving it one sample to test. I am very new to this so I am not sure what I am doing. I read somewhere that I should use softmax to get a probability/confidence. I am using code from another implementation that doesn’t get the probability, it just returns a 1 or a 0. I am using Pytorch 3.0

Here is my code:

    for batch_idx, (x, y) in enumerate(dataloader): #comprised of one sample
        x = Variable(x.cuda())
        y = Variable(y.cuda())

        # forward pass
        y_model = model(x)

        # loss pass
        loss = loss_fct(y_model, y).mean()

        # predict pass
        _, predicted = torch.topk(y_model, k=1)
        correct = predicted.data.eq(y.data.view_as(predicted.data)).cpu().sum()

        # metrics
        total_loss += loss.data[0] * len(y)

        total_correct += correct
        total += len(y)

        print("{} set for {} {}: Average Loss: {:.4f}, Accuracy: {:.2f}%".format(
            "Test", "benign", "null?", total_loss / total,
                               total_correct * 100. / total))
        

I am not sure what a lot of this code means, or why it was used. The code was originally taken from here:

How to I feed the model the sample, which I assume is the variable “y” and get the confidence.

Thanks in advance.

1 Like

You could apply softmax on the output of your model, if it’s raw logits. Try to call F.softmax(y_model, dim=1) which should give you the probabilities of all classes. Could you check the last layer of your model so see if it’s just a linear layer without an activation function?

2 Likes

Here’s how my network looks like:

Sequential(
  (0): Linear(in_features=22761, out_features=300, bias=True)
  (1): ReLU()
  (2): Linear(in_features=300, out_features=300, bias=True)
  (3): ReLU()
  (4): Linear(in_features=300, out_features=300, bias=True)
  (5): ReLU()
  (6): Linear(in_features=300, out_features=2, bias=True)
  (7): Softmax()
)

I tried running the code you gave me and got this as the output:

Variable containing:
 0.7311  0.2689
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]

I am not sure what these two numbers mean however. They are the same for every input. Could you please explain what is going on? Thanks

Since your model already has a softmax layer at the end, you don’t have to use F.softmax on top of it. The outputs of your model are already “probabilities” of the classes.

However, your training might not work, depending on your loss function.
For a classification use case you would most likely use a nn.LogSoftmax layer with nn.NLLLoss as the critertion or raw logits, i.e. no non-linearity and nn.CrossEntropyLoss.
As you are currently using nn.Softmax, you would need to call torch.log on the output and feed it to nn.NLLLoss, which might be numerically unstable.
I would recommend to use the raw logits + nn.CrossEntropyLoss for training and if you really need to see the probabilities, just call F.softmax on the output as described in the other post.

Thanks for the replies.

I tried running the following code for my model trained with softmax and nn.NLLLoss.

        loss_fct = nn.NLLLoss(reduce=False)
        print(loss_fct(torch.log(y_model), y))

This outputs:

Variable containing:
 16.9570
[torch.cuda.FloatTensor of size 1 (GPU 0)]

I’m not sure if NLLLoss is supposed to be used with softmax, in their code they used logsoftmax with NLLLoss, but I changed it to softmax to get probabilities. Does this mean I need to change the loss function to nn.CrossEntropyLoss to get the model to train right?

Well, I’ve tried to explain this use case in my last answer.
Basically you have these options:

  • nn.Softmax + torch.log + nn.NLLLoss -> might be numerically unstable
  • nn.LogSoftmax + nn.NLLLoss -> is perfectly fine for training; to get probabilities you would have to call torch.exp on the output
  • raw logits + nn.CrossEntropyLoss -> also perfectly fine as it calls the second approach internally; to get probabilities you would have to call torch.softmax on the output

Note that you should not feed the probabilities (using softmax) to any loss function.

3 Likes

I understand now, Thanks

@ptrblck I see people using logits like this for KL divergence loss:
both pred_x and pred_x_h are logits of same dimensions, applying softmax is converting them into probablilities.
pred_x = F.softmax(model(x), dim=1)
pred_x_h = F.log_softmax(model(x_h), dim=1)
F.kl_div(pred_x_h, pred_x, None, None, reduction=‘sum’).

I am new to pytorch, not sure if thats the right thing to do?

That seems to be right. From the docs:

As with NLLLoss , the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).

1 Like

Why can’t I find torch.softmax anywhere in the documentation?

It seems to be undocumented, so please stick to torch.nn.functional.softmax.

Any plans on its depreciation similar to nn.functional.sigmoid as mentioned here

What are typical values to get probabilites in the second case of the three you listed? Are probabilites values between 0 and 1 or between 0 and 100 (percent) in this case? I get a tensor containing two values for binary classification, how do I know which probability refers to which class label?

everytime reading your reply to others always help me get more knowledge~~

1 Like

If you apply the torch.exp on your nn.LogSoftmax output, the values should be in the range [0, 100].

You define the order of the classes by creating the target. I.e. output[0] will correspond to the class with index 0 in your target, output[1] to index 1, etc.

3 Likes