(Newbie Question) Getting the gradient of output with respect to the input

Hello all, I’m new to using neural network libraries, so I’m sorry if this is a stupid question. I have a pretrained network with a 28x28 input(MNIST) image and 10 outputs. I want to get the gradient of one of those outputs wrt the input.

To do this, I do one forward pass by doing output=net(input), where input is a variable with requires_grad=True. After this, I want to do a backward pass, but the pass requires a vector as the input. Does this vector list a kind of weight list where the gradients of different outputs are summed? In particular, since the gradient of only one of those outputs, should I use output.backward([0,0,0,1,0,...,0]) or something else?

Thank you very much in advance!

Variables have requires_grad=True by default.
What you want to do, if you’re doing MNIST classification, is take the output and labels and compute the CrossEntropyLoss. This will be a scalar variable. You can now do loss.backward() and it will backprop i.e. compute gradients back through to the inputs.

If you want gradOut or gradInp of a layer check out register_hook function.

@DiffEverything By default the requires_grad is False:

import torch
a = torch.autograd.Variable(torch.Tensor(10,10))
print(a.requires_grad) # prints False

@ckanbak if what you’re looking for is how the input should vary to increase its score for a given digit, then yes, you should wrap the input in a Variable with requires_grad=True and then use what you proposed (note that you may want to convert the list into a Tensor with torch.Tensor([0,0,0,1,0,...,0])).
You can also, given the size 10 vector output and the index of the digit you’re looking for do: current_score = output[selected_digit] and then current_score.backward().

2 Likes

my bad. I mixed up Parameter and Variable for requires_grad.
sorry.

Thanks a lot for your reply!
One more question, I understand that I have to use retain_variables=True if I want to compute the gradient for multiple times. Can I still do that without computing additional variables if I use current_score = output[selected_digit] and then current_score.backward(retain_variables=True)? Or do I have to do it with output.backward(torch.Tensor([0,0,0,1,0,...,0]),retain_variables=True)while changing the position of 1 for different digits?

Hi,

Yes it will work.
I think the simplest way if you want to accumulate the gradients associated with a given set of digits:

for digit in selected_digits:
    output[digit].backward(retrain_variables=True)

@albanD @DiffEverything Hi, thanks for your reply. I have a follow-up question. In my implementation the input is passed through two different networks. I am interested the gradient of the final output w.r.t. each of the input.(i.e. both dz/dx and dz/dy) I tried the following code, but the y.grad is always zero. Currently I can temperately fix it by converting y to numpy and reinitialize the Variable again in the same way as x, then pass through net2. I am wondering if it is possible to get y.grad in the very first pass. Am I missing anything?

net1.zero_grad(); net2.zero_grad()
x = Variable(torch.ones((1,20)), requires_grad=True)
y = net1(x)
z = net2(y)
z.backward(torch.ones(z.size()),retain_variables=True)
print(x.grad) # this will give non-zero value
print(y.grad) # this is all zero

net1.zero_grad(); net2.zero_grad()
y = torch.from_numpy(y.data.numpy())
y = Variable(y, requires_grad=True)
z = net2(y)
z.backward(torch.ones(z.size()),retain_variables=True)
print(y.grad) # this will give non-zero value

y.grad is the gradient of a non-leaf variable. To access it’s gradient, you have to use a backward hook:

1 Like

Hi, would this still be the preferred approach in version 1.0, I can’t seem to get retrain_variable=True to work and am trying to do the same thing

Thanks

John

Hi,

You can check the doc here: the option you should use in 1.0 is retain_graph=True.

Thanks I have looked at that. If I want to get the gradients of each input with respect to each output in a loop such as above then would I need to do
for digit in selected_digits:
output[digit].backward(retain_graph=True)
grad[digit] = input.grad()

If I do this will the gradients coming out of input increment each time or will they be overwritten. I read that gradients are retained and only cleared when the model is zero grad is called. If they are incremental then can I get each individual value but taking the difference of the previous value and the latest?

Thanks again

They will accumulate.
You can clone the values you want to save and then zero_grad() the gradients before calling the next backward.