Nan Loss coming after some time

You can use forward hooks as described here to check all intermediate outputs for NaN values.
Since the inputs are valid and the loss doesn’t seem to explode, I guess a particular layer might create these invalid outputs, which are then propagated to the loss calculation.