I am trying to get a confidence from a model after giving it one sample to test. I am very new to this so I am not sure what I am doing. I read somewhere that I should use softmax to get a probability/confidence. I am using code from another implementation that doesn’t get the probability, it just returns a 1 or a 0. I am using Pytorch 3.0
Here is my code:
for batch_idx, (x, y) in enumerate(dataloader): #comprised of one sample
x = Variable(x.cuda())
y = Variable(y.cuda())
# forward pass
y_model = model(x)
# loss pass
loss = loss_fct(y_model, y).mean()
# predict pass
_, predicted = torch.topk(y_model, k=1)
correct = predicted.data.eq(y.data.view_as(predicted.data)).cpu().sum()
# metrics
total_loss += loss.data[0] * len(y)
total_correct += correct
total += len(y)
print("{} set for {} {}: Average Loss: {:.4f}, Accuracy: {:.2f}%".format(
"Test", "benign", "null?", total_loss / total,
total_correct * 100. / total))
I am not sure what a lot of this code means, or why it was used. The code was originally taken from here:
How to I feed the model the sample, which I assume is the variable “y” and get the confidence.
You could apply softmax on the output of your model, if it’s raw logits. Try to call F.softmax(y_model, dim=1) which should give you the probabilities of all classes. Could you check the last layer of your model so see if it’s just a linear layer without an activation function?
Since your model already has a softmax layer at the end, you don’t have to use F.softmax on top of it. The outputs of your model are already “probabilities” of the classes.
However, your training might not work, depending on your loss function.
For a classification use case you would most likely use a nn.LogSoftmax layer with nn.NLLLoss as the critertion or raw logits, i.e. no non-linearity and nn.CrossEntropyLoss.
As you are currently using nn.Softmax, you would need to call torch.log on the output and feed it to nn.NLLLoss, which might be numerically unstable.
I would recommend to use the raw logits + nn.CrossEntropyLoss for training and if you really need to see the probabilities, just call F.softmax on the output as described in the other post.
Variable containing:
16.9570
[torch.cuda.FloatTensor of size 1 (GPU 0)]
I’m not sure if NLLLoss is supposed to be used with softmax, in their code they used logsoftmax with NLLLoss, but I changed it to softmax to get probabilities. Does this mean I need to change the loss function to nn.CrossEntropyLoss to get the model to train right?
Well, I’ve tried to explain this use case in my last answer.
Basically you have these options:
nn.Softmax + torch.log + nn.NLLLoss -> might be numerically unstable
nn.LogSoftmax + nn.NLLLoss -> is perfectly fine for training; to get probabilities you would have to call torch.exp on the output
raw logits + nn.CrossEntropyLoss -> also perfectly fine as it calls the second approach internally; to get probabilities you would have to call torch.softmax on the output
Note that you should not feed the probabilities (using softmax) to any loss function.
@ptrblck I see people using logits like this for KL divergence loss:
both pred_x and pred_x_h are logits of same dimensions, applying softmax is converting them into probablilities.
pred_x = F.softmax(model(x), dim=1)
pred_x_h = F.log_softmax(model(x_h), dim=1)
F.kl_div(pred_x_h, pred_x, None, None, reduction=‘sum’).
I am new to pytorch, not sure if thats the right thing to do?
As with NLLLoss, the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).
What are typical values to get probabilites in the second case of the three you listed? Are probabilites values between 0 and 1 or between 0 and 100 (percent) in this case? I get a tensor containing two values for binary classification, how do I know which probability refers to which class label?
If you apply the torch.exp on your nn.LogSoftmax output, the values should be in the range [0, 100].
You define the order of the classes by creating the target. I.e. output[0] will correspond to the class with index 0 in your target, output[1] to index 1, etc.