Difference between SelectBackward and MaxBackward1

In PyTorch, if I do x = torch.tensor([1., 2., 3.], requires_grad=True), then max(x) returns tensor(3., grad_fn=<SelectBackward>), while torch.max(x) returns tensor(3., grad_fn=<MaxBackward1>). What is the difference between these two gradients? When calling backward() they both give the same answer at least in this small example.

The two backward functions behave differently when you try an input where multiple indices are tied for maximum. SelectBackward routes the gradient to the first maximum value while MaxBackward evenly distributes.

x = torch.tensor([1., 1.], requires_grad=True)
print(torch.autograd.grad(torch.max(x), (x,))[0]) # tensor([0.5000, 0.5000])
print(torch.autograd.grad(max(x), (x,))[0]) # tensor([1., 0.])

Thank you, very interesting. Is there any page that documents all grads and their behaviors? I couldn’t find it by googling.

In general we don’t consider what backward functions we use part of the public api, so unfortunately they aren’t super well documented. They are subject to change, so try not to rely on that behavior.

For example when you call max(tensor) in versions>=1.7, the grad_fn is now UnbindBackward instead of SelectBackward because max is a python builtin that relies onTensor.__iter__.
The behavior for torch.max to evenly distribute gradients is also new in 1.7 and that is documented in the release notes here [0].

[0] Release PyTorch 1.7 released w/ CUDA 11, New APIs for FFTs, Windows support for Distributed training and more · pytorch/pytorch · GitHub

1 Like