I would like to know if this is something new in pytorch. As far as I remember, in the initial pytorch releases the gradients of a parameter computed from a loss whose value was Nan (due to numerical saturation) where also Nan. However, I have realized that when my loss goes to Nan, the gradients w.r.t the parameters is 0. Is that the correct pytorch behavior?
I would expect the same and can still reproduce it:
model = models.resnet18()
out = model(torch.randn(1, 3, 224, 224))
loss = out.mean() * torch.log(torch.tensor(-1.))
for name, param in model.named_parameters():
Could you post a code snippet to reproduce it?
It might be a bit difficult to post a snippet of the code as it is quite a big code to be able and reproduce that behaviour. I will try once a finish with the project and I have a stable version.