You can use forward hooks as described here to check all intermediate outputs for NaN values.
Since the inputs are valid and the loss doesn’t seem to explode, I guess a particular layer might create these invalid outputs, which are then propagated to the loss calculation.