Trouble getting probability from softmax

Ky6000 · October 8, 2018, 11:26am

I am trying to get a confidence from a model after giving it one sample to test. I am very new to this so I am not sure what I am doing. I read somewhere that I should use softmax to get a probability/confidence. I am using code from another implementation that doesn’t get the probability, it just returns a 1 or a 0. I am using Pytorch 3.0

Here is my code:

    for batch_idx, (x, y) in enumerate(dataloader): #comprised of one sample
        x = Variable(x.cuda())
        y = Variable(y.cuda())

        # forward pass
        y_model = model(x)

        # loss pass
        loss = loss_fct(y_model, y).mean()

        # predict pass
        _, predicted = torch.topk(y_model, k=1)
        correct = predicted.data.eq(y.data.view_as(predicted.data)).cpu().sum()

        # metrics
        total_loss += loss.data[0] * len(y)

        total_correct += correct
        total += len(y)

        print("{} set for {} {}: Average Loss: {:.4f}, Accuracy: {:.2f}%".format(
            "Test", "benign", "null?", total_loss / total,
                               total_correct * 100. / total))

I am not sure what a lot of this code means, or why it was used. The code was originally taken from here:

github.com

ALFA-group/robust-adv-malware-detection/blob/master/framework.py

# coding=utf-8
"""
Python module for performing adversarial training for malware detection
"""
import os
import torch
import random
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
from utils.utils import load_parameters, stack_tensors
from datasets.datasets import load_data
from inner_maximizers.inner_maximizers import inner_maximizer
from nets.ff_classifier import build_ff_classifier
from blindspot_coverage.covering_number import CoveringNumber
import losswise
import time
import json
import numpy as np

This file has been truncated. show original

How to I feed the model the sample, which I assume is the variable “y” and get the confidence.

Thanks in advance.

ptrblck · October 8, 2018, 2:15pm

You could apply softmax on the output of your model, if it’s raw logits. Try to call F.softmax(y_model, dim=1) which should give you the probabilities of all classes. Could you check the last layer of your model so see if it’s just a linear layer without an activation function?

Ky6000 · October 9, 2018, 12:44am

Here’s how my network looks like:

Sequential(
  (0): Linear(in_features=22761, out_features=300, bias=True)
  (1): ReLU()
  (2): Linear(in_features=300, out_features=300, bias=True)
  (3): ReLU()
  (4): Linear(in_features=300, out_features=300, bias=True)
  (5): ReLU()
  (6): Linear(in_features=300, out_features=2, bias=True)
  (7): Softmax()
)

I tried running the code you gave me and got this as the output:

Variable containing:
 0.7311  0.2689
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]

I am not sure what these two numbers mean however. They are the same for every input. Could you please explain what is going on? Thanks

ptrblck · October 9, 2018, 12:49am

Since your model already has a softmax layer at the end, you don’t have to use F.softmax on top of it. The outputs of your model are already “probabilities” of the classes.

However, your training might not work, depending on your loss function.
For a classification use case you would most likely use a nn.LogSoftmax layer with nn.NLLLoss as the critertion or raw logits, i.e. no non-linearity and nn.CrossEntropyLoss.
As you are currently using nn.Softmax, you would need to call torch.log on the output and feed it to nn.NLLLoss, which might be numerically unstable.
I would recommend to use the raw logits + nn.CrossEntropyLoss for training and if you really need to see the probabilities, just call F.softmax on the output as described in the other post.

Ky6000 · October 9, 2018, 2:08am

Thanks for the replies.

I tried running the following code for my model trained with softmax and nn.NLLLoss.

        loss_fct = nn.NLLLoss(reduce=False)
        print(loss_fct(torch.log(y_model), y))

This outputs:

Variable containing:
 16.9570
[torch.cuda.FloatTensor of size 1 (GPU 0)]

I’m not sure if NLLLoss is supposed to be used with softmax, in their code they used logsoftmax with NLLLoss, but I changed it to softmax to get probabilities. Does this mean I need to change the loss function to nn.CrossEntropyLoss to get the model to train right?

ptrblck · October 9, 2018, 5:29am

Well, I’ve tried to explain this use case in my last answer.
Basically you have these options:

nn.Softmax + torch.log + nn.NLLLoss -> might be numerically unstable
nn.LogSoftmax + nn.NLLLoss -> is perfectly fine for training; to get probabilities you would have to call torch.exp on the output
raw logits + nn.CrossEntropyLoss -> also perfectly fine as it calls the second approach internally; to get probabilities you would have to call torch.softmax on the output

Note that you should not feed the probabilities (using softmax) to any loss function.

Ky6000 · October 9, 2018, 5:38am

I understand now, Thanks

Sanjeev_Dubey · July 16, 2019, 11:09am

@ptrblck I see people using logits like this for KL divergence loss:
both pred_x and pred_x_h are logits of same dimensions, applying softmax is converting them into probablilities.
pred_x = F.softmax(model(x), dim=1)
pred_x_h = F.log_softmax(model(x_h), dim=1)
F.kl_div(pred_x_h, pred_x, None, None, reduction=‘sum’).

I am new to pytorch, not sure if thats the right thing to do?

ptrblck · July 16, 2019, 1:03pm

That seems to be right. From the docs:

As with NLLLoss , the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).

Shubhankar · December 9, 2019, 8:33pm

Why can’t I find torch.softmax anywhere in the documentation?

ptrblck · December 9, 2019, 8:38pm

It seems to be undocumented, so please stick to torch.nn.functional.softmax.

Shubhankar · December 9, 2019, 10:21pm

Any plans on its depreciation similar to nn.functional.sigmoid as mentioned here

vckd · December 26, 2019, 9:53am

What are typical values to get probabilites in the second case of the three you listed? Are probabilites values between 0 and 1 or between 0 and 100 (percent) in this case? I get a tensor containing two values for binary classification, how do I know which probability refers to which class label?

fupanbo · December 26, 2019, 10:28am

everytime reading your reply to others always help me get more knowledge~~

ptrblck · December 26, 2019, 5:44pm

If you apply the torch.exp on your nn.LogSoftmax output, the values should be in the range [0, 100].

You define the order of the classes by creating the target. I.e. output[0] will correspond to the class with index 0 in your target, output[1] to index 1, etc.