Confused about torch.max() and gradient

jam12 · March 3, 2018, 9:58am

x = Variable(torch.randn(1,3),requires_grad=True)
z,_ = torch.max(x,1)
z.backward()
print(x.grad)

Variable containing:
 1  0  0
[torch.FloatTensor of size 1x3]

I understand the max operation is a not differentiable operation. So why can I still get the gradient here?

jpeg729 · March 3, 2018, 10:03am

max simply selects the greatest value and ignores the others, so max is the identity operation for that one element. Therefore the gradient can flow backwards through it for just that one element.

SimonW · March 3, 2018, 3:59pm

Also, argmax is not continuous almost everywhere. But max is continuous everywhere.

fiona_H · February 25, 2019, 4:47pm

z,y = torch.max(x,1)

So that is the reason y doesn’t have a gradient function?

SunHaozhe · July 19, 2019, 12:09pm

What do you mean by “argmax is not continuous almost everywhere. But max is continuous everywhere” ?

Do you mean that we can do backpropagation with max operation but not argmax operation ?

SimonW · July 19, 2019, 10:08pm

To be precise, I should have said that argmax is not differentiable, but max is.

pcshih · August 8, 2019, 7:15am

Is there a method to make index having gradient function?
i.e.

import torch
h = torch.randn(1,2,5, requires_grad=True); print(h)
val,idx = h.max(1, keepdim=True)
print(val)
print(idx)
print(val)

outputs are:

tensor([[[-0.5372, -0.4683, 0.4891, -0.1686, -0.4147],
[-1.4412, 1.2837, -0.4467, 0.1731, 1.3256]]], requires_grad=True)
tensor([[[-0.5372, 1.2837, 0.4891, 0.1731, 1.3256]]],
grad_fn=)
tensor([[[0, 1, 0, 1, 1]]])

I want the tensor([[[0, 1, 0, 1, 1]]]) to have gradient function.

SimonW · August 8, 2019, 5:56pm

It is mathematically not differentiable, so no.

pcshih · August 9, 2019, 1:40am

In this paper section 3.3

We first select Y frames (i.e. keyframes) based on the prediction scores from the
decoder.

The decoder output is [2,320], which means non-keyframe score and key frame score of the 320 frames. We want to find a 0/1 vector according to the decoder output but the process of [2,320] → 0/1 vector seems not differentiable…

How to implement this in pytorch?

Thank you very much.

netaglazer · January 21, 2020, 8:34am

i guess it means that we can say the same thing to the minimum operation?

netaglazer · January 23, 2020, 9:27pm

and also, can you explain it a little bit more?
why the fact that it is identity operation for the max elememt changes the situation?
and if so, why they invented the softmax?

entslscheia · February 18, 2020, 5:08pm

Actually, softmax is more like softargmax… I have to say softmax is a terrible name.