You could check the forward activations for invalid values via forward hooks as described here. Once you’ve isolated which layer creates the NaN outputs, check it’s inputs as well as parameters.
If the parameters show invalid values, most likely the gradients were too large, the model was diverging, and the parameters were overflowing. On the other hand, if the inputs are containing NaNs, check the previous operation and see, if/how it could create invalid values.