Autograd yields nan even if nan input is independant of output


I have a question about how autograd handles Nans,

a = torch.tensor([np.nan, 2])
param = torch.tensor([1.], requires_grad=True)

(a * param)[1].backward()
print(param.grad) # yields tensor([nan])

I was expecting it to return tensor([2]) since param grad is independent of the a[0]

Is this normal behavior ? Is there a way to force autograd to ignore nan if the grad is independent of that value ?

Thank you for your answer.

you’re broadcasting param tensor to a.shape during multiplication, it would be independent with:
param = torch.tensor([1., 1.], requires_grad=True)


Thank you for your answer. Why does broadcasting create any dependence to the nan ?
param is supposed to be a scalar value, but even while defining param = torch.tensor(1., required_grad=True), param.grad is still equal to nan.

that’s because of derivative d(a*p)/dp=a=nan, this is computed elementwise after broadcasting