Multi-class cross entropy loss and softmax in pytorch

KFrank · July 8, 2019, 7:53pm

Hi Brando!

If you want to use a cross-entropy-like loss function, you shouldn’t
use a softmax layer because of the well-known problem of increased
risk of overflow.

I gave a few words of explanation about this problem in a reply in
another thread:

You should either use nn.CrossEntropyLoss (which takes
pre-softmax logits, rather than post-softmax probabilities)
without a softmax-like layer, or use a nn.LogSoftmax layer,
and feed the results into nn.NLLLoss. (Both of these combine
an implicit softmax with the subsequent log in a way that avoids
the enhanced overflow problem.)

If you are stuck for some reason with your softmax layer, you
should run the probabilities output by softmax through log(),
and then feed the log-probabilities to nn.NLLLoss (but expect
increased risk of overflow).

(I am not aware of any single pytorch cross-entropy loss function
that takes post-softmax probabilities directly.)

Good luck!

K. Frank