Hi Lorenzo!
I didn’t look at your code, but if you wrote your softmax and
cross-entropy functions as two separate functions you are
probably tripping over the following problem.
Softmax contains exp()
and cross-entropy contains log()
,
so this can happen:
large number → exp()
→ overflow NaN → log()
→ still NaN
even though, mathematically (i.e., without overflow),
log (exp (large number)) = large number
(no NaN).
Pytorch’s CrossEntropyLoss
(for example) uses standard
techniques to combine together, in effect, the log()
and the
exp()
to avoid the overflow.
If need to write your own loss function, you will need to use
these same techniques to avoid the NaNs. If you’re just doing
this for learning (Good for you!), you have learned that this is
one good reason to use pytorch’s built-in CrossEntropyLoss
function.
Good luck!
K. Frank