Softmax gives output vector whose sum is greater than 1

I am a newbie to PyTorch. I was trying out the following network architecture to train a multi-class classifier. I used Softmax at the output layer and cross entropy as the loss function. However, the output doesn’t look like probabilities. For example, one of the outputs looks like this [2.0032e-10, 1.798e-8, …1.0000e+0,…2.112e-4]. My question is, how can one of them be 1 when their sum has to be equal to 1.

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 13 features in input layer
        self.fc1 = nn.Linear(13, 512)
        self.fc2 = nn.Linear(512, 512)
        self.fc3 = nn.Linear(512, 512)
        self.fc4 = nn.Linear(512, 512)
        self.fc5 = nn.Linear(512, 40)
        self.bn1 = nn.BatchNorm1d(512)
        self.bn2 = nn.BatchNorm1d(512)
        self.bn3 = nn.BatchNorm1d(512)
        self.bn4 = nn.BatchNorm1d(512)

    def forward(self, x):
        x = self.bn1(F.relu(self.fc1(x)))
        x = self.bn2(F.relu(self.fc2(x)))
        x = self.bn3(F.relu(self.fc3(x)))
        x = self.bn4(F.relu(self.fc4(x)))
        x = nn.Softmax(dim=1)(self.fc5(x))
        return x

Please correct me if I am wrong and help me out.

You are most likely seeing some floating point precision issues.
That being said, note that nn.CrossEntropyLoss expects logits, as internally F.log_softmax and nn.NLLLoss will be applied, so you should remove the softmax for this criterion.

1 Like

If softmax is removed, output range is not between 0 and 1. It contains negative values as well like [-12.098, 2.0988, -12.121…, 0.87, 0.21]. But I need probabilities for each of the classes.

nn.CrossEntropyLoss expects these logits.
For debugging purposes you could still apply softmax on the output. Just don’t pass it to the criterion.

So how do I get the class probabilities? I need the output as probabilities. How can I achieve that without/with using Softmax?

Solved. I got output as probabilities by sending the predicted values to softmax but didn’t include softmax in the net architecture as cross-entropy already applies softmax internally. Thank you.