Clarification - Using backward() on non-scalars

Hey,

Sorry if this is obvious, but I find the description of the torch.autograd.backward(variables, grad_variables, retain_variables=False) function quite confusing.

I’m working on a project where I have a vector of variables that I would like to differentiate to find the Jacobian. When it comes to implementing this, I’m not sure what form grad_variables should be or what a ‘sequence of Tensor’ is. I’ve tried many things, but all throw an error.

Would anyone be able to point me in the direction of an example if one exists? If not, say I had the following super simple example:

x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True) M = Variable(torch.FloatTensor([[1,2],[3,4]])) y = torch.mm(x,M)

What should the arguments for y.backward() be so that I can find [[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]] (i.e. recover M)?

6 Likes

The naming of grad_variables might be a little bit confusing. In the context of neural networks, it’s the “loss”.

To recover M requires two calls to backwards. Here’s how with Variable.backward():

x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True)
M = Variable(torch.FloatTensor([[1,2],[3,4]]))
y = torch.mm(x, M)
jacobian = torch.FloatTensor(2, 2).zero_()
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
jacobian[:,0] = x.grad.data
x.grad.data.zero_()
y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)
jacobian[:,1] = x.grad.data

You can also replace the y.backward() calls are equivalent to:

torch.autograd.backward([y], [torch.FloatTensor([[1, 0]])], retain_variables=True)

14 Likes

Fab, that’s a great help. Thanks for your time.

1 Like

The naming of grad_variables might be a little bit confusing. In the context of neural networks, it’s the “loss”.

I am a bit confused by this. In this example, isn’t that the “loss” is y?

torch.FloatTensor([[1, 0]]) is passed to grad_variables and specifies that we’ll use the first column of y’s gradients, and torch.FloatTensor([[0, 1]]) for the second column.

By default, grad_variables is torch.Tensor([1]) which means we will just keep calculated gradients.

Is that correct?

Thanks a lot!

Thanks for your intuitive example.
I’m getting a bit lost on the usage of the following line:
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
especially when it comes to the gradient argument.

As @linlin already mentioned, in the provided example the argument gradient=torch.FloatTensor([[1,0]])
is used to acquire information for a column (e.g. the first one in the above scenario) or to define the variable
that we compute the gradient with respect to (as seen from: http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.backward) ?

If we use it with respect to a given variable why
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
jacobian[:,0] = x.grad.data
x.grad.data.zero_()
y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)
jacobian[:,1] = x.grad.data

is used instead of

y.backward(x * torch.FloatTensor([[1, 0]]), retain_variables=True)
jacobian[:,0] = x.grad.data
x.grad.data.zero_()
y.backward(x * torch.FloatTensor([[0, 1]]), retain_variables=True)
jacobian[:,1] = x.grad.data

for computing [[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]] ?

Thank you in advance!

I think derivative of y w.r.t to x is not M but transpose of M.
dy1/dx = [M11, M21], dy2/dx = [M12, M22].

Also, you can see here for the meaning of gradient argument in backward() method.

2 Likes

Great answer @jdhao! This solves my ambiguities. Thanks a lot!!!
I guess this answers @linlin question.

cheers! Stackoverflow is your friend :slight_smile:

Thanks to @jsm, @jdhao. It clarified the cloud away.

But I still think the naming of grad_variables is a bit misleading. Something like grad_weighting would be more intuitive.

Stackoverflow’s question and answers were perfect. Thanks for the link to SO thread @jdhao.

You can give upvote to support that answer if you have stackoverflow account :slight_smile:

1 Like

Thanks for the explanation @colesbury.
I just want to know what if my y is an output of the inceptionv3 model. In that case, I have to run a loop (1000 iterations). Is there a better way to do this?

OMG, thank u so much! I wasted tons of time on understanding so-called gradient on the non-scalar output and you make it clear using one word: “loss” :laughing:

@saan77 Hi! Did you solve it?