Softmax outputing 0 or 1 instead of probabilities

Karan_Chhabra · November 4, 2020, 3:22am

I am using a pre-train network with nn.BCEWithLogitsLoss() loss for a multilabel problem. I want the output of the network as probabilities, but after using Softmax, I am getting the output of 0 or 1, which seems quite confusing as Softmax should not output perfectly 0 or 1 of any class, it should output the probabilities for various classes instead.

Below is the image of my code:

Below is the image of the output:

ptrblck · November 4, 2020, 8:22am

The softmax operation might output values (close to) this discrete values, if a particular logit in the input activation has a relatively positively large value as seen here:

x = torch.randn(1, 10)
out = F.softmax(x, dim=1)
print(out)
> tensor([[0.1612, 0.1486, 0.1232, 0.0626, 0.0162, 0.3084, 0.0166, 0.0811, 0.0098,
         0.0723]])

x[0, 1] = 1000.
out = F.softmax(x, dim=1)
print(out)
> tensor([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]])

If you are using nn.BCEWithLogitsLoss, I assume you are working on a multi-label classification use case. If that’s the case, you should remove the softmax and pass the raw logits to this criterion, as internally log_sigmoid will be applied.

Karan_Chhabra · November 4, 2020, 5:57pm

Hi @ptrblck,
Yes, it is a multi-label classification problem. Is there a way to convert logits into probabilities, as the softmax is output 0 and 1 for all the observations.
I want to use cutoff point to choose the labels instead of topk classes, what should I do to convert the output into probabilities? As I have tried to take the exp of the logits, but their sum is substantially greater than 1.
Shall I use some different loss to get the probabilities?

ptrblck · November 5, 2020, 1:56am

For a multi-label classification you would apply sigmoid to the outputs to get the probability for each class separately.
Note that nn.BCEWihtLogitsLoss still expects raw logits.
You could apply the sigmoid and use nn.BCELoss instead, but this would reduce the numerical stability.

Karan_Chhabra · November 5, 2020, 3:33am

Using BCELoss throughs an error:

While the same network works with nn.BCEWihtLogitsLoss

ptrblck · November 5, 2020, 3:45am

One of the tensors (model output, target or weight) is a DoubleTensor, while a FloatTensor is expected, so you would have to transform it via tensor = tensor.float().

S_Prakash · December 22, 2023, 8:15am

If your numbers are huge , like torch.tensor([748,1028,2047]) , then exponential(748) will give you very very large number. Large enough that it will cause overflow. in such cases output probabilities be like [0,0,1].
but,
If by someway you normalize your input to a ‘caculatable’ range , then it will give you probabilities.
This is what i have seen in my case.(yes , ptrblck is right.)