Network output with 100% probability on one class

FilipBd · March 5, 2021, 1:28pm

Quite new to PyTorch and DL, but trying to do an MLP with 256 classes (possible values for one byte). The desired output is a tensor with length 256 with probabilities for each class, so the sum of the output tensor should be 1 (right?), using Softmax. In our model, we have softmax on the output layer activation function and for loss function we are using nn.NLLLoss()(torch.log(y), y_target).

During the training, the output of the network have values between 0-1 on each class but when testing the network after a completed training, the output is 1 (100% prob) for one class. It is not the same class that get 1 as output for different input, but each time the network seems to be really sure on one class. It only gets the correct class on less then 1% of the time.

This seems a bit strange, it should not be like this right? Do you have any idea on what is wrong here?

ptrblck · March 6, 2021, 6:04am

Besides the issue of the “overconfident” model predicting one class (I don’t know what’s causing this), I would recommend to use F.log_softmax instead of torch.log(torch.softmax()) for better numerical stability.
Alternatively, you could also pass the outputs of the last layer (logits) directly to nn.CrossEntropyLoss.