Confused about torch.max() and gradient

x = Variable(torch.randn(1,3),requires_grad=True)
z,_ = torch.max(x,1)

Variable containing:
 1  0  0
[torch.FloatTensor of size 1x3]

I understand the max operation is a not differentiable operation. So why can I still get the gradient here?

1 Like

max simply selects the greatest value and ignores the others, so max is the identity operation for that one element. Therefore the gradient can flow backwards through it for just that one element.

1 Like

Also, argmax is not continuous almost everywhere. But max is continuous everywhere.

1 Like
z,y = torch.max(x,1)

So that is the reason y doesn’t have a gradient function?

What do you mean by “argmax is not continuous almost everywhere. But max is continuous everywhere” ?

Do you mean that we can do backpropagation with max operation but not argmax operation ?

To be precise, I should have said that argmax is not differentiable, but max is.

Is there a method to make index having gradient function?

import torch
h = torch.randn(1,2,5, requires_grad=True); print(h)
val,idx = h.max(1, keepdim=True)

outputs are:

tensor([[[-0.5372, -0.4683, 0.4891, -0.1686, -0.4147],
[-1.4412, 1.2837, -0.4467, 0.1731, 1.3256]]], requires_grad=True)
tensor([[[-0.5372, 1.2837, 0.4891, 0.1731, 1.3256]]],
tensor([[[0, 1, 0, 1, 1]]])

I want the tensor([[[0, 1, 0, 1, 1]]]) to have gradient function.

It is mathematically not differentiable, so no.

In this paper section 3.3

We first select Y frames (i.e. keyframes) based on the prediction scores from the

The decoder output is [2,320], which means non-keyframe score and key frame score of the 320 frames. We want to find a 0/1 vector according to the decoder output but the process of [2,320] -> 0/1 vector seems not differentiable…

How to implement this in pytorch?

Thank you very much.