When I train my model for many epoches, the prediction outputs are not a probability distribution

Could you print the shape of output?
If you are dealing with a classification problem, it should have the shape [batch_size, nb_classes] and your softmax should calculate the probabilities in dim1.
Based on the comments in your code, it looks like you are dealing with a tensor of shape [12]?

Also, you can post code using three backticks ` :wink:
This will make it easier to search for your question and code in this board.