Gradient value is nan

You could add torch.autograd.set_detect_anomaly(True) at the beginning of your script to get an error with a stack trace, which should point to the operation, which created the NaNs and which should help debugging the issue.

21 Likes