Do tensors containing NaNs affect the loss.backward()?

Hi,

I have a loss function that looks like this:

 disp_diff = (torch.max(eval_target[:, -1:, ...], eval_source[:, -1:, ...]) /
 torch.min(eval_target[:, -1:, ...], eval_source[:, -1:, ...])) - 1

disp_diff[disp_diff != disp_diff] = 0

loss = torch.sum(disp_diff, dim=[2, 3])

The max/min operation can be (0/0) = NaN. That’s why I set those position to 0 afterwards. The problem arises that when I try to do the loss.backward() in the next iteration the network only predicts NaNs. Is this because eval_target and eval_source have requires_grad=True and their NaNs can be propagated during the backward pass?

I think you would have to avoid running into these invalid values, as manipulating the output afterwards would still run into the invalid operation in the backward pass, if I’m not mistaken.
Here is a small code example:

x = torch.randn(10, requires_grad=True)
d = torch.randn(10)
d[0] = 0.

out = x / d
print(out)
> tensor([   -inf, -0.9076, -0.6809, -0.0224, -0.4553, -1.4387, -1.9699,  0.2929,
         0.5049, -0.8641], grad_fn=<DivBackward0>)

out[~torch.isfinite(out)] = 0
print(out)
> tensor([ 0.0000, -0.9076, -0.6809, -0.0224, -0.4553, -1.4387, -1.9699,  0.2929,
         0.5049, -0.8641], grad_fn=<IndexPutBackward>)

out.mean().backward()
print(x.grad)
> tensor([    nan, -0.1204,  0.0864, -0.0858, -0.1498,  0.1549,  0.1525,  0.2894,
         0.0665, -1.0095])
1 Like

Didnt know this. Thanks!