Unexpected NaN gradient propagated when applying boolean conditions

KFrank · June 3, 2022, 6:55pm

Hi rbregier!

Arul’s explanation is correct. torch.where() backpropagates a gradient
of 0 * nan = nan (or maybe 0 * inf = nan) through the “branch not
taken.”

An example:

>>> import torch
>>> torch.__version__
'1.10.2'
>>> s1 = torch.zeros (4, requires_grad = True)
>>> s2 = torch.zeros (4, requires_grad = True)
>>> t1 = torch.tensor ([0.5, 0.5, 0.0, 0.0])
>>> t2 = torch.tensor ([0.5, 0.5, 1.e-7, 1.e-7])
>>> torch.where (t1 > 1.e-6, s1 / t1, s1).sum().backward()
>>> torch.where (t2 > 1.e-6, s2 / t2, s2).sum().backward()
>>> s1.grad
tensor([2., 2., nan, nan])
>>> s2.grad
tensor([2., 2., 1., 1.])   # no gradients of 1.e7

It’s certainly a known issue, but I don’t think the devs consider it a bug.
Apparently, it’s caused deep inside how autograd works with where()
and would be difficult to fix.

This github issues gives some explanation:

github.com/pytorch/pytorch

Incorrect gradients for torch.where when one of the target tensors contains inf/nan

opened 07:40PM - 25 Jul 19 UTC

closed 08:00PM - 25 Jul 19 UTC

egrefen

## 🐛 Bug The `grad_fn` of `torch.where` returns the gradients of the wrong ar…gument, rather than of the selected tensor, if the other tensor's gradients have infs or nans. ## To Reproduce Run this code: ```python x = torch.tensor([16., 0.], requires_grad=True) y = x/2 # tensor([8., 0.], grad_fn=<DivBackward0>) z = x.sqrt() + 1 # tensor([5., 1.], grad_fn=<SqrtBackward>) # Calculate dy/dx, dz/dx dydx = torch.autograd.grad(y.sum(), x, retain_graph=True)[0] # tensor([0.5000, 0.5000]) dzdx = torch.autograd.grad(z.sum(), x, retain_graph=True)[0] # tensor([0.1250, inf]) # Define w = [w0, w1] == [y0, z1] w = torch.where(x == 0., y, z) # tensor([5., 0.], grad_fn=<SWhereBackward>) expected_dw_dx = torch.where(x == 0., dydx, dzdx) # tensor([0.1250, 0.5000]) dwdx = torch.autograd.grad(w.sum(), x, retain_graph=True)[0] # is actually tensor([0.1250, inf]) print("`torch.where` communicates gradients correctly:", torch.equal(expected_dw_dx, dwdx)) ``` ## Expected behavior I would expect `expected_dw_dx == dwdx` in the example above. ## Environment Please copy and paste the output from our [environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py) (or fill out the checklist below manually). You can get the script and run it with: ``` wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py # For security purposes, please check the contents of collect_env.py before running it. python collect_env.py ``` PyTorch version: 1.1.0 Is debug build: No CUDA used to build PyTorch: 9.0.176 OS: Ubuntu 18.04.1 LTS GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 CMake version: version 3.10.2 Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 9.0.176 GPU models and configuration: GPU 0: Quadro GP100 GPU 1: Quadro GP100 Nvidia driver version: 410.79 cuDNN version: Could not collect Versions of relevant libraries: [pip] numpy==1.16.4 [pip] torch==1.1.0 [pip] torchvision==0.3.0 [conda] torch 1.1.0 <pip> [conda] torchvision 0.3.0 <pip>

My approach is to get rid of the nans. You can safely feed an incorrect
value to the “zero” branch of torch.where(), as long as it’s not nan
(or inf, etc.).

In your specific case, I would take advantage of the fact that sinc() is an
even function (sinc (-x) = sinc (x)), and clamp() the denominator
away from zero:

sinc_base = torch.sin (x.abs()) / x.abs().clamp (1.e-7)

So two things happen: For x.abs() < 1.e-7, sinc_base will be an
incorrect value, but it won’t be nan. However, for x.abs (x) < 1.e-6,
torch.where() will switch you over to sinc_taylor, so you will never
see the incorrect sinc_base values.

Then, for gradients, when abs (x) < 1.e-6, torch.where() will (in part)
backpropagate 0 * sinc_base_gradient. Although sinc_base_gradient
will be incorrect for abs (x) < 1.e-7, it won’t be nan, so autograd will
correctly backpropagate 0 (rather than nan) for this piece of the of the
abs (x) < 1.e-6 branch.

Best.

K. Frank