Youyoun
(Youyoun)
September 10, 2021, 10:40am
1
Hello,
I have a question about how autograd handles Nans,
a = torch.tensor([np.nan, 2])
param = torch.tensor([1.], requires_grad=True)
(a * param)[1].backward()
print(param.grad) # yields tensor([nan])
I was expecting it to return tensor([2])
since param grad is independent of the a[0]
Is this normal behavior ? Is there a way to force autograd to ignore nan
if the grad is independent of that value ?
Thank you for your answer.
googlebot
(Alex)
September 13, 2021, 8:31am
2
you’re broadcasting param tensor to a.shape during multiplication, it would be independent with:
param = torch.tensor([1., 1.], requires_grad=True)
Youyoun
(Youyoun)
September 13, 2021, 1:25pm
3
Hi,
Thank you for your answer. Why does broadcasting create any dependence to the nan
?
param
is supposed to be a scalar value, but even while defining param = torch.tensor(1., required_grad=True)
, param.grad
is still equal to nan
.
googlebot
(Alex)
September 13, 2021, 2:52pm
4
that’s because of derivative d(a*p)/dp=a=nan, this is computed elementwise after broadcasting