Softmax not summing to 1

I am using a softmax layer at the end of my fully connected network. It does not sum to 1. This is an issue because it tends to increase proportional to the number of epochs. Not sure why this is happening here is the code.

class NN(nn.Module):
    def __init__(self):
        super(NN, self).__init__()

        self.fc1 = nn.Linear(1024, 128)
        self.fc2 = nn.Linear(128, 128)
        self.softmax = nn.Softmax(0)

    def forward(self, input):
        output = self.fc1(input)
        output = self.fc2(output)
        ouput = self.softmax(output)
        return output



def train(x_tensor, y_tensor):
    output = rnn(x_tensor.view(-1))
    output = output.unsqueeze(0)
    y_tensor = y_tensor[0].unsqueeze(0)
    y_guess = np.argmax(np.array(y_tensor))  #.long()
    y_guess = torch.from_numpy(np.array(y_guess)).unsqueeze(0)
    loss = criterion(output, y_guess)  
    loss.backward()
    optimizer.step()
    return output, loss.item()


      rnn = NN()
      lr = 0.001
      optimizer = torch.optim.SGD(rnn.parameters(), lr = lr)
      criterion = nn.CrossEntropyLoss()

Hi,

It should sum to 1 in most cases.
Are the individual values very different / very small / very large?
Could you provide an example input where you see this problem so that we can reproduce it?

even from the start it is not adding to 1.

sample output after a few thousand epochs

tensor([[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0154, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0938, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.7029, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
203.8531, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.8635, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000]])

What are the input to the softmax function that return such values?

not sure what was happening but it is fixed.

Unrelated to your question, but note that nn.CrossEntropyLoss expects logits as the model output not probabilities coming from softmax.
Internally F.log_softmax and nn.NLLLOSS will be used so you can just remove the softmax as the output activation.

Also note that you can call torch.arxmax directly without transforming to bumpy and back to PyTorch.

2 Likes

Thank you for this advice. What is happening now is that after a few 1000 epochs the net just returns all 0ā€™sā€¦