My transformer NMT model is giving "nan" loss value

If the invalid values is created in the forward pass, you could use e.g. forward hooks to check all intermediate outputs for NaNs and Infs (have a look at this post to see an example usage). On the other hand, if you think that the backward pass might create invalid gradients, which would then create invalid parameters, you could use torch.autograd.set_detect_anomaly(True) in your code to get more information about the failing layer.