(Newbie Question) Getting the gradient of output with respect to the input

ckanbak · March 22, 2017, 3:11pm

Hello all, I’m new to using neural network libraries, so I’m sorry if this is a stupid question. I have a pretrained network with a 28x28 input(MNIST) image and 10 outputs. I want to get the gradient of one of those outputs wrt the input.

To do this, I do one forward pass by doing output=net(input), where input is a variable with requires_grad=True. After this, I want to do a backward pass, but the pass requires a vector as the input. Does this vector list a kind of weight list where the gradients of different outputs are summed? In particular, since the gradient of only one of those outputs, should I use output.backward([0,0,0,1,0,...,0]) or something else?

Thank you very much in advance!

DiffEverything · March 22, 2017, 4:57pm

Variables have requires_grad=True by default.
What you want to do, if you’re doing MNIST classification, is take the output and labels and compute the CrossEntropyLoss. This will be a scalar variable. You can now do loss.backward() and it will backprop i.e. compute gradients back through to the inputs.

If you want gradOut or gradInp of a layer check out register_hook function.

albanD · March 22, 2017, 5:14pm

@DiffEverything By default the requires_grad is False:

import torch
a = torch.autograd.Variable(torch.Tensor(10,10))
print(a.requires_grad) # prints False

@ckanbak if what you’re looking for is how the input should vary to increase its score for a given digit, then yes, you should wrap the input in a Variable with requires_grad=True and then use what you proposed (note that you may want to convert the list into a Tensor with torch.Tensor([0,0,0,1,0,...,0])).
You can also, given the size 10 vector output and the index of the digit you’re looking for do: current_score = output[selected_digit] and then current_score.backward().

DiffEverything · March 22, 2017, 6:24pm

my bad. I mixed up Parameter and Variable for requires_grad.
sorry.

ckanbak · March 23, 2017, 8:55am

Thanks a lot for your reply!
One more question, I understand that I have to use retain_variables=True if I want to compute the gradient for multiple times. Can I still do that without computing additional variables if I use current_score = output[selected_digit] and then current_score.backward(retain_variables=True)? Or do I have to do it with output.backward(torch.Tensor([0,0,0,1,0,...,0]),retain_variables=True)while changing the position of 1 for different digits?

albanD · March 23, 2017, 10:59am

Hi,

Yes it will work.
I think the simplest way if you want to accumulate the gradients associated with a given set of digits:

for digit in selected_digits:
    output[digit].backward(retrain_variables=True)

Guan-Horng_Liu · May 11, 2017, 2:06pm

@albanD @DiffEverything Hi, thanks for your reply. I have a follow-up question. In my implementation the input is passed through two different networks. I am interested the gradient of the final output w.r.t. each of the input.(i.e. both dz/dx and dz/dy) I tried the following code, but the y.grad is always zero. Currently I can temperately fix it by converting y to numpy and reinitialize the Variable again in the same way as x, then pass through net2. I am wondering if it is possible to get y.grad in the very first pass. Am I missing anything?

net1.zero_grad(); net2.zero_grad()
x = Variable(torch.ones((1,20)), requires_grad=True)
y = net1(x)
z = net2(y)
z.backward(torch.ones(z.size()),retain_variables=True)
print(x.grad) # this will give non-zero value
print(y.grad) # this is all zero

net1.zero_grad(); net2.zero_grad()
y = torch.from_numpy(y.data.numpy())
y = Variable(y, requires_grad=True)
z = net2(y)
z.backward(torch.ones(z.size()),retain_variables=True)
print(y.grad) # this will give non-zero value

smth · May 13, 2017, 4:42am

y.grad is the gradient of a non-leaf variable. To access it’s gradient, you have to use a backward hook:

fromLittleAcorns · January 7, 2019, 5:02pm

Hi, would this still be the preferred approach in version 1.0, I can’t seem to get retrain_variable=True to work and am trying to do the same thing

Thanks

John

albanD · January 7, 2019, 5:23pm

Hi,

You can check the doc here: the option you should use in 1.0 is retain_graph=True.

fromLittleAcorns · January 7, 2019, 5:31pm

Thanks I have looked at that. If I want to get the gradients of each input with respect to each output in a loop such as above then would I need to do
for digit in selected_digits:
output[digit].backward(retain_graph=True)
grad[digit] = input.grad()

If I do this will the gradients coming out of input increment each time or will they be overwritten. I read that gradients are retained and only cleared when the model is zero grad is called. If they are incremental then can I get each individual value but taking the difference of the previous value and the latest?

Thanks again

albanD · January 7, 2019, 7:06pm

They will accumulate.
You can clone the values you want to save and then zero_grad() the gradients before calling the next backward.