Pytorch cross entropy loss and softmax

Theoretically, my model architecture should have a dense layer with a softmax activation at the output. However, in practice, and since I’m using the cross entropy loss from Pytorch, which already uses softmax, I do not add a softmax activation to my model. My question is, should I add a softmax activation exclusively for the evaluation mode? Especially since the model now has been trained to output values between [0, 1] without the need for one. Also, in theory, can I say that my model ends with a Softmax activation? If not, is there a way to use pytorch’s categorical cross entropy loss without the softmax activation?

Thank you for your response.
Now let’s suppose one would still want the Softmax activation to figure in the model’s definition. Would that somehow be possible using Pytorch?

You can have a Softmax layer in the model definition, and then in your training function apply a log to the output and use the NLLLoss. This will be equivalent to no last layer + CrossEntropyLoss. However, it’s not recommended because of numerical stability issues when doing the softmax and the log separately, so don’t do that. See the note in Softmax.

Another valid option is to have a LogSoftmax layer at the end and use NLLLoss (again, equivalent to no last layer + CrossEntropyLoss). This one is numerically stable.

1 Like

Thank you! I thought of something similar but had no idea about the stability issues related to using softmax seperately.