Quite new to PyTorch and DL, but trying to do an MLP with 256 classes (possible values for one byte). The desired output is a tensor with length 256 with probabilities for each class, so the sum of the output tensor should be 1 (right?), using Softmax. In our model, we have softmax on the output layer activation function and for loss function we are using nn.NLLLoss()(torch.log(y), y_target).
During the training, the output of the network have values between 0-1 on each class but when testing the network after a completed training, the output is 1 (100% prob) for one class. It is not the same class that get 1 as output for different input, but each time the network seems to be really sure on one class. It only gets the correct class on less then 1% of the time.
This seems a bit strange, it should not be like this right? Do you have any idea on what is wrong here?