Difference between SelectBackward and MaxBackward1

anon · February 10, 2021, 1:36am

In PyTorch, if I do x = torch.tensor([1., 2., 3.], requires_grad=True), then max(x) returns tensor(3., grad_fn=<SelectBackward>), while torch.max(x) returns tensor(3., grad_fn=<MaxBackward1>). What is the difference between these two gradients? When calling backward() they both give the same answer at least in this small example.

soulitzer · February 10, 2021, 4:47am

The two backward functions behave differently when you try an input where multiple indices are tied for maximum. SelectBackward routes the gradient to the first maximum value while MaxBackward evenly distributes.

x = torch.tensor([1., 1.], requires_grad=True)
print(torch.autograd.grad(torch.max(x), (x,))[0]) # tensor([0.5000, 0.5000])
print(torch.autograd.grad(max(x), (x,))[0]) # tensor([1., 0.])

anon · February 10, 2021, 11:12am

Thank you, very interesting. Is there any page that documents all grads and their behaviors? I couldn’t find it by googling.

soulitzer · February 10, 2021, 5:26pm

In general we don’t consider what backward functions we use part of the public api, so unfortunately they aren’t super well documented. They are subject to change, so try not to rely on that behavior.

For example when you call max(tensor) in versions>=1.7, the grad_fn is now UnbindBackward instead of SelectBackward because max is a python builtin that relies onTensor.__iter__.
The behavior for torch.max to evenly distribute gradients is also new in 1.7 and that is documented in the release notes here [0].

[0] Release PyTorch 1.7 released w/ CUDA 11, New APIs for FFTs, Windows support for Distributed training and more · pytorch/pytorch · GitHub