In PyTorch, if I do `x = torch.tensor([1., 2., 3.], requires_grad=True)`

, then `max(x)`

returns `tensor(3., grad_fn=<SelectBackward>)`

, while `torch.max(x)`

returns `tensor(3., grad_fn=<MaxBackward1>)`

. What is the difference between these two gradients? When calling `backward()`

they both give the same answer at least in this small example.

The two backward functions behave differently when you try an input where multiple indices are tied for maximum. SelectBackward routes the gradient to the first maximum value while MaxBackward evenly distributes.

```
x = torch.tensor([1., 1.], requires_grad=True)
print(torch.autograd.grad(torch.max(x), (x,))[0]) # tensor([0.5000, 0.5000])
print(torch.autograd.grad(max(x), (x,))[0]) # tensor([1., 0.])
```

Thank you, very interesting. Is there any page that documents all grads and their behaviors? I couldn’t find it by googling.

In general we don’t consider what backward functions we use part of the public api, so unfortunately they aren’t super well documented. They are subject to change, so try not to rely on that behavior.

For example when you call `max(tensor)`

in versions>=1.7, the grad_fn is now UnbindBackward instead of SelectBackward because max is a python builtin that relies on`Tensor.__iter__`

.

The behavior for `torch.max`

to evenly distribute gradients is also new in 1.7 and that is documented in the release notes here [0].