Autograd and the locations from torch.max (backprop and gradients)

IsaacGS · March 10, 2018, 10:42pm

Say I have a matrix X which is of size (x1, x2, x3, x4, x5) and I do,

values, locations = torch.max(X, 2)

Is there a good way of being able to do autograd for the locations, rather than the values? Yes, I can multiply the values by the locations to come up with a weighted metric, but what I’m really looking for is just the locations.

Right now after I do backward(), I get None for my gradient.

albanD · March 12, 2018, 10:27am

Hi,

Short answer is no, if you look at this as a mathematical function, it is constant almost everywhere and so will just have a 0 gradient.
I am not sure what multiplying the values by the locations is doing, but that does not look like it’s doing what you want.

This is exactly why we use cross entropy loss for classification for example. Otherwise, we could just argmax the output of the network and check if it matches or not the ground truth label.