Hi Brando!
If you want to use a cross-entropy-like loss function, you shouldn’t
use a softmax layer because of the well-known problem of increased
risk of overflow.
I gave a few words of explanation about this problem in a reply in
another thread:
You should either use nn.CrossEntropyLoss
(which takes
pre-softmax logits, rather than post-softmax probabilities)
without a softmax-like layer, or use a nn.LogSoftmax
layer,
and feed the results into nn.NLLLoss
. (Both of these combine
an implicit softmax with the subsequent log in a way that avoids
the enhanced overflow problem.)
If you are stuck for some reason with your softmax layer, you
should run the probabilities output by softmax through log()
,
and then feed the log-probabilities to nn.NLLLoss
(but expect
increased risk of overflow).
(I am not aware of any single pytorch cross-entropy loss function
that takes post-softmax probabilities directly.)
Good luck!
K. Frank