Nan loss in RNN model?

apaszke · February 22, 2017, 10:01am

The NaNs appear, because softmax + log separately can be a numerically unstable operation.

If you’re using CrossEntropyLoss for training, you could use the F.log_softmax function at the end of your model and use NLLLoss. The loss will be equivalent, but much more stable.