Nan loss in RNN model?

The NaNs appear, because softmax + log separately can be a numerically unstable operation.

If you’re using CrossEntropyLoss for training, you could use the F.log_softmax function at the end of your model and use NLLLoss. The loss will be equivalent, but much more stable.

8 Likes