# Clarification - Using backward() on non-scalars

Hey,

Sorry if this is obvious, but I find the description of the `torch.autograd.backward(variables, grad_variables, retain_variables=False)` function quite confusing.

I’m working on a project where I have a vector of variables that I would like to differentiate to find the Jacobian. When it comes to implementing this, I’m not sure what form `grad_variables` should be or what a ‘sequence of Tensor’ is. I’ve tried many things, but all throw an error.

Would anyone be able to point me in the direction of an example if one exists? If not, say I had the following super simple example:

`x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True) M = Variable(torch.FloatTensor([[1,2],[3,4]])) y = torch.mm(x,M)`

What should the arguments for `y.backward()` be so that I can find `[[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]]` (i.e. recover M)?

6 Likes

The naming of `grad_variables` might be a little bit confusing. In the context of neural networks, it’s the “loss”.

To recover M requires two calls to backwards. Here’s how with `Variable.backward()`:

``````x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True)
M = Variable(torch.FloatTensor([[1,2],[3,4]]))
y = torch.mm(x, M)
jacobian = torch.FloatTensor(2, 2).zero_()
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)
``````

You can also replace the `y.backward()` calls are equivalent to:

`torch.autograd.backward([y], [torch.FloatTensor([[1, 0]])], retain_variables=True)`

13 Likes

Fab, that’s a great help. Thanks for your time.

1 Like

The naming of grad_variables might be a little bit confusing. In the context of neural networks, it’s the “loss”.

I am a bit confused by this. In this example, isn’t that the “loss” is y?

`torch.FloatTensor([[1, 0]])` is passed to `grad_variables` and specifies that we’ll use the first column of y’s gradients, and `torch.FloatTensor([[0, 1]])` for the second column.

By default, `grad_variables` is `torch.Tensor([1])` which means we will just keep calculated gradients.

Is that correct?

Thanks a lot!

I’m getting a bit lost on the usage of the following line:
`y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)`
especially when it comes to the `gradient` argument.

As @linlin already mentioned, in the provided example the argument `gradient=torch.FloatTensor([[1,0]])`
is used to acquire information for a column (e.g. the first one in the above scenario) or to define the variable

If we use it with respect to a given variable why
`y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)`
`jacobian[:,0] = x.grad.data`
`x.grad.data.zero_()`
`y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)`
`jacobian[:,1] = x.grad.data`

`y.backward(x * torch.FloatTensor([[1, 0]]), retain_variables=True)`
`jacobian[:,0] = x.grad.data`
`x.grad.data.zero_()`
`y.backward(x * torch.FloatTensor([[0, 1]]), retain_variables=True)`
`jacobian[:,1] = x.grad.data`

for computing `[[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]]` ?

I think derivative of y w.r.t to x is not M but transpose of M.
dy1/dx = [M11, M21], dy2/dx = [M12, M22].

Also, you can see here for the meaning of `gradient` argument in `backward()` method.

2 Likes

Great answer @jdhao! This solves my ambiguities. Thanks a lot!!!
I guess this answers @linlin question.

Thanks to @jsm, @jdhao. It clarified the cloud away.

But I still think the naming of `grad_variables` is a bit misleading. Something like `grad_weighting` would be more intuitive.

I just want to know what if my `y` is an output of the inceptionv3 model. In that case, I have to run a loop (1000 iterations). Is there a better way to do this?