Backpropagated gradients with a Nan Loss


I would like to know if this is something new in pytorch. As far as I remember, in the initial pytorch releases the gradients of a parameter computed from a loss whose value was Nan (due to numerical saturation) where also Nan. However, I have realized that when my loss goes to Nan, the gradients w.r.t the parameters is 0. Is that the correct pytorch behavior?


I would expect the same and can still reproduce it:

model = models.resnet18()
out = model(torch.randn(1, 3, 224, 224))

loss = out.mean() * torch.log(torch.tensor(-1.))

for name, param in model.named_parameters():
    print(name, torch.isnan(param.grad).all())

Could you post a code snippet to reproduce it?


It might be a bit difficult to post a snippet of the code as it is quite a big code to be able and reproduce that behaviour. I will try once a finish with the project and I have a stable version.

Thank you.