Interpreting the output of torch.autograd.grad

The network I am optimizing has an internal bottleneck where all activations are interpretable distributions over a map (e.g., like a heatmap). I am trying to visualize the gradients of these distributions wrt. some loss that I am optimizing. Right now, I am doing it like this:

# [1] Run a forward pass on the network, 
# getting the output logits and the internal distribution activations 
# (which are Tensors)
logits, distributions = network(inputs)

# [2] Compute the loss for this graph
loss = compute_loss(logits, targets) 

# [3] Get the gradients for the distribution tensors wrt. the loss being minimized
dist_gradients = torch.autograd.grad(loss, distributions) 

# [4] Call my own visualization code which displays the original distributions and the
# gradients layered on top to see how the distributions should change when optimizing
# the loss. Both tensors are the same size (the size of the heatmap, so in my case 25x25)
visualize_grads(distributions, dist_gradients)

# [5] Call a backward pass and apply gradients; this will only update the parameters
# used to compute the distribution tensors (and not the parameters that get from
# distributions to the logits), but that is intentional
torch.autograd.backward(distributions, dist_gradients)

# [6] Visualize the new distributions after applying the step and running another forward pass
# on the same inputs
_, new_distributions = network(inputs)

The problem I am facing is that the gradients coming out of torch.autograd.grad look like they have the opposite directionality that they should have. I verify this by comparing the difference between the visualized distributions in steps [4] (before optimization) and [6] (after optimization), and also looking at the dist_gradients used to optimize. For example, let’s say position [0, 0] in the distributions had a value of 0.2 before optimization, and a value of 0.4 after optimization. However, dist_gradients[0][0] < 0, which to me, seems like it should not be increasing the value after optimization, but rather decreasing it.

Is it possible that I should negate the gradients before visualizing them this way? After discussing with a few others, it seems like gradients within a graph are actually opposite the intuitive (at least to me) directionality. This might be supported by looking at the code for SGD, where gradients for each parameter are multiplied by negative the learning rate in the optimizer step.



Gradients give you the direction of steepest ascent. So if you are looking for a descent direction, yes you should flip the signs.
As you saw, in SGD, you do x = x - lr * grad (with a negative sign).

Interesting. Thanks!! This clears up a lot.