Network output with 100% probability on one class

Quite new to PyTorch and DL, but trying to do an MLP with 256 classes (possible values for one byte). The desired output is a tensor with length 256 with probabilities for each class, so the sum of the output tensor should be 1 (right?), using Softmax. In our model, we have softmax on the output layer activation function and for loss function we are using nn.NLLLoss()(torch.log(y), y_target).

During the training, the output of the network have values between 0-1 on each class but when testing the network after a completed training, the output is 1 (100% prob) for one class. It is not the same class that get 1 as output for different input, but each time the network seems to be really sure on one class. It only gets the correct class on less then 1% of the time.

This seems a bit strange, it should not be like this right? Do you have any idea on what is wrong here?

Besides the issue of the “overconfident” model predicting one class (I don’t know what’s causing this), I would recommend to use F.log_softmax instead of torch.log(torch.softmax()) for better numerical stability.
Alternatively, you could also pass the outputs of the last layer (logits) directly to nn.CrossEntropyLoss.