# Trouble getting probability from softmax

I am trying to get a confidence from a model after giving it one sample to test. I am very new to this so I am not sure what I am doing. I read somewhere that I should use softmax to get a probability/confidence. I am using code from another implementation that doesn’t get the probability, it just returns a 1 or a 0. I am using Pytorch 3.0

Here is my code:

``````    for batch_idx, (x, y) in enumerate(dataloader): #comprised of one sample
x = Variable(x.cuda())
y = Variable(y.cuda())

# forward pass
y_model = model(x)

# loss pass
loss = loss_fct(y_model, y).mean()

# predict pass
_, predicted = torch.topk(y_model, k=1)
correct = predicted.data.eq(y.data.view_as(predicted.data)).cpu().sum()

# metrics
total_loss += loss.data * len(y)

total_correct += correct
total += len(y)

print("{} set for {} {}: Average Loss: {:.4f}, Accuracy: {:.2f}%".format(
"Test", "benign", "null?", total_loss / total,
total_correct * 100. / total))

``````

I am not sure what a lot of this code means, or why it was used. The code was originally taken from here:

How to I feed the model the sample, which I assume is the variable “y” and get the confidence.

1 Like

You could apply softmax on the output of your model, if it’s raw logits. Try to call `F.softmax(y_model, dim=1)` which should give you the probabilities of all classes. Could you check the last layer of your model so see if it’s just a linear layer without an activation function?

2 Likes

Here’s how my network looks like:

``````Sequential(
(0): Linear(in_features=22761, out_features=300, bias=True)
(1): ReLU()
(2): Linear(in_features=300, out_features=300, bias=True)
(3): ReLU()
(4): Linear(in_features=300, out_features=300, bias=True)
(5): ReLU()
(6): Linear(in_features=300, out_features=2, bias=True)
(7): Softmax()
)
``````

I tried running the code you gave me and got this as the output:

``````Variable containing:
0.7311  0.2689
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]
``````

I am not sure what these two numbers mean however. They are the same for every input. Could you please explain what is going on? Thanks

Since your model already has a softmax layer at the end, you don’t have to use `F.softmax` on top of it. The outputs of your model are already “probabilities” of the classes.

For a classification use case you would most likely use a `nn.LogSoftmax` layer with `nn.NLLLoss` as the critertion or raw logits, i.e. no non-linearity and `nn.CrossEntropyLoss`.
As you are currently using `nn.Softmax`, you would need to call `torch.log` on the output and feed it to `nn.NLLLoss`, which might be numerically unstable.
I would recommend to use the raw logits + `nn.CrossEntropyLoss` for training and if you really need to see the probabilities, just call `F.softmax` on the output as described in the other post.

Thanks for the replies.

I tried running the following code for my model trained with softmax and nn.NLLLoss.

``````        loss_fct = nn.NLLLoss(reduce=False)
print(loss_fct(torch.log(y_model), y))
``````

This outputs:

``````Variable containing:
16.9570
[torch.cuda.FloatTensor of size 1 (GPU 0)]
``````

I’m not sure if NLLLoss is supposed to be used with softmax, in their code they used logsoftmax with NLLLoss, but I changed it to softmax to get probabilities. Does this mean I need to change the loss function to nn.CrossEntropyLoss to get the model to train right?

Well, I’ve tried to explain this use case in my last answer.
Basically you have these options:

• `nn.Softmax` + `torch.log` + `nn.NLLLoss` -> might be numerically unstable
• `nn.LogSoftmax` + `nn.NLLLoss` -> is perfectly fine for training; to get probabilities you would have to call `torch.exp` on the output
• raw logits + `nn.CrossEntropyLoss` -> also perfectly fine as it calls the second approach internally; to get probabilities you would have to call `torch.softmax` on the output

Note that you should not feed the probabilities (using softmax) to any loss function.

3 Likes

I understand now, Thanks

@ptrblck I see people using logits like this for KL divergence loss:
both pred_x and pred_x_h are logits of same dimensions, applying softmax is converting them into probablilities.
pred_x = F.softmax(model(x), dim=1)
pred_x_h = F.log_softmax(model(x_h), dim=1)
F.kl_div(pred_x_h, pred_x, None, None, reduction=‘sum’).

I am new to pytorch, not sure if thats the right thing to do?

That seems to be right. From the docs:

As with `NLLLoss` , the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).

1 Like

Why can’t I find `torch.softmax` anywhere in the documentation?

It seems to be undocumented, so please stick to `torch.nn.functional.softmax`.

Any plans on its depreciation similar to `nn.functional.sigmoid` as mentioned here

What are typical values to get probabilites in the second case of the three you listed? Are probabilites values between 0 and 1 or between 0 and 100 (percent) in this case? I get a tensor containing two values for binary classification, how do I know which probability refers to which class label?

If you apply the `torch.exp` on your `nn.LogSoftmax` output, the values should be in the range `[0, 100]`.
You define the order of the classes by creating the target. I.e. `output` will correspond to the class with index 0 in your target, `output` to index 1, etc.