Clarification - Using backward() on non-scalars

Mattie · March 14, 2017, 6:58pm

Hey,

Sorry if this is obvious, but I find the description of the torch.autograd.backward(variables, grad_variables, retain_variables=False) function quite confusing.

I’m working on a project where I have a vector of variables that I would like to differentiate to find the Jacobian. When it comes to implementing this, I’m not sure what form grad_variables should be or what a ‘sequence of Tensor’ is. I’ve tried many things, but all throw an error.

Would anyone be able to point me in the direction of an example if one exists? If not, say I had the following super simple example:

x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True) M = Variable(torch.FloatTensor([[1,2],[3,4]])) y = torch.mm(x,M)

What should the arguments for y.backward() be so that I can find [[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]] (i.e. recover M)?

colesbury · March 14, 2017, 8:05pm

The naming of grad_variables might be a little bit confusing. In the context of neural networks, it’s the “loss”.

To recover M requires two calls to backwards. Here’s how with Variable.backward():

x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True)
M = Variable(torch.FloatTensor([[1,2],[3,4]]))
y = torch.mm(x, M)
jacobian = torch.FloatTensor(2, 2).zero_()
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
jacobian[:,0] = x.grad.data
x.grad.data.zero_()
y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)
jacobian[:,1] = x.grad.data

You can also replace the y.backward() calls are equivalent to:

torch.autograd.backward([y], [torch.FloatTensor([[1, 0]])], retain_variables=True)

Mattie · March 14, 2017, 8:14pm

Fab, that’s a great help. Thanks for your time.

linlin · October 19, 2017, 4:06pm

The naming of grad_variables might be a little bit confusing. In the context of neural networks, it’s the “loss”.

I am a bit confused by this. In this example, isn’t that the “loss” is y?

torch.FloatTensor([[1, 0]]) is passed to grad_variables and specifies that we’ll use the first column of y’s gradients, and torch.FloatTensor([[0, 1]]) for the second column.

By default, grad_variables is torch.Tensor([1]) which means we will just keep calculated gradients.

Is that correct?

Thanks a lot!

jsm · November 19, 2017, 2:59pm

Thanks for your intuitive example.
I’m getting a bit lost on the usage of the following line:
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
especially when it comes to the gradient argument.

As @linlin already mentioned, in the provided example the argument gradient=torch.FloatTensor([[1,0]])
is used to acquire information for a column (e.g. the first one in the above scenario) or to define the variable
that we compute the gradient with respect to (as seen from: http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.backward) ?

If we use it with respect to a given variable why
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
jacobian[:,0] = x.grad.data
x.grad.data.zero_()
y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)
jacobian[:,1] = x.grad.data

is used instead of

y.backward(x * torch.FloatTensor([[1, 0]]), retain_variables=True)
jacobian[:,0] = x.grad.data
x.grad.data.zero_()
y.backward(x * torch.FloatTensor([[0, 1]]), retain_variables=True)
jacobian[:,1] = x.grad.data

for computing [[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]] ?

Thank you in advance!

jdhao · November 21, 2017, 8:19am

I think derivative of y w.r.t to x is not M but transpose of M.
dy1/dx = [M11, M21], dy2/dx = [M12, M22].

Also, you can see here for the meaning of gradient argument in backward() method.

jsm · November 21, 2017, 9:52am

Great answer @jdhao! This solves my ambiguities. Thanks a lot!!!
I guess this answers @linlin question.

jdhao · November 21, 2017, 10:03am

cheers! Stackoverflow is your friend

linlin · November 21, 2017, 1:10pm

Thanks to @jsm, @jdhao. It clarified the cloud away.

But I still think the naming of grad_variables is a bit misleading. Something like grad_weighting would be more intuitive.

yottabytt · December 15, 2017, 5:44pm

Stackoverflow’s question and answers were perfect. Thanks for the link to SO thread @jdhao.

jdhao · December 15, 2017, 5:47pm

You can give upvote to support that answer if you have stackoverflow account

saan77 · March 15, 2018, 4:58pm

Thanks for the explanation @colesbury.
I just want to know what if my y is an output of the inceptionv3 model. In that case, I have to run a loop (1000 iterations). Is there a better way to do this?

Jolyon · February 19, 2019, 8:17pm

OMG, thank u so much! I wasted tons of time on understanding so-called gradient on the non-scalar output and you make it clear using one word: “loss”

Huimin_ZENG · December 4, 2019, 5:22pm

@saan77 Hi! Did you solve it?