 # Clarification - Using backward() on non-scalars

#1

Hey,

Sorry if this is obvious, but I find the description of the `torch.autograd.backward(variables, grad_variables, retain_variables=False)` function quite confusing.

I’m working on a project where I have a vector of variables that I would like to differentiate to find the Jacobian. When it comes to implementing this, I’m not sure what form `grad_variables` should be or what a ‘sequence of Tensor’ is. I’ve tried many things, but all throw an error.

Would anyone be able to point me in the direction of an example if one exists? If not, say I had the following super simple example:

`x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True) M = Variable(torch.FloatTensor([[1,2],[3,4]])) y = torch.mm(x,M)`

What should the arguments for `y.backward()` be so that I can find `[[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]]` (i.e. recover M)?

How to use torch.autograd.backward when variables are non-scalar
(colesbury) #2

The naming of `grad_variables` might be a little bit confusing. In the context of neural networks, it’s the “loss”.

To recover M requires two calls to backwards. Here’s how with `Variable.backward()`:

``````x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True)
M = Variable(torch.FloatTensor([[1,2],[3,4]]))
y = torch.mm(x, M)
jacobian = torch.FloatTensor(2, 2).zero_()
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)
``````

You can also replace the `y.backward()` calls are equivalent to:

`torch.autograd.backward([y], [torch.FloatTensor([[1, 0]])], retain_variables=True)`

#3

Fab, that’s a great help. Thanks for your time.

Wasserstein loss layer/criterion
#4

The naming of grad_variables might be a little bit confusing. In the context of neural networks, it’s the “loss”.

I am a bit confused by this. In this example, isn’t that the “loss” is y?

`torch.FloatTensor([[1, 0]])` is passed to `grad_variables` and specifies that we’ll use the first column of y’s gradients, and `torch.FloatTensor([[0, 1]])` for the second column.

By default, `grad_variables` is `torch.Tensor()` which means we will just keep calculated gradients.

Is that correct?

Thanks a lot!

#5

Thanks for your intuitive example.
I’m getting a bit lost on the usage of the following line:
`y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)`
especially when it comes to the `gradient` argument.

As @linlin already mentioned, in the provided example the argument `gradient=torch.FloatTensor([[1,0]])`
is used to acquire information for a column (e.g. the first one in the above scenario) or to define the variable
that we compute the gradient with respect to (as seen from: http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.backward) ?

If we use it with respect to a given variable why
`y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)`
`jacobian[:,0] = x.grad.data`
`x.grad.data.zero_()`
`y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)`
`jacobian[:,1] = x.grad.data`

is used instead of

`y.backward(x * torch.FloatTensor([[1, 0]]), retain_variables=True)`
`jacobian[:,0] = x.grad.data`
`x.grad.data.zero_()`
`y.backward(x * torch.FloatTensor([[0, 1]]), retain_variables=True)`
`jacobian[:,1] = x.grad.data`

for computing `[[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]]` ?

Thank you in advance!

(jdhao) #6

I think derivative of y w.r.t to x is not M but transpose of M.
dy1/dx = [M11, M21], dy2/dx = [M12, M22].

Also, you can see here for the meaning of `gradient` argument in `backward()` method.

#7

Great answer @jdhao! This solves my ambiguities. Thanks a lot!!!
I guess this answers @linlin question.

(jdhao) #8

cheers! Stackoverflow is your friend #9

Thanks to @jsm, @jdhao. It clarified the cloud away.

But I still think the naming of `grad_variables` is a bit misleading. Something like `grad_weighting` would be more intuitive.

(Yottabytt) #10

Stackoverflow’s question and answers were perfect. Thanks for the link to SO thread @jdhao.

(jdhao) #11

You can give upvote to support that answer if you have stackoverflow account #12

Thanks for the explanation @colesbury.
I just want to know what if my `y` is an output of the inceptionv3 model. In that case, I have to run a loop (1000 iterations). Is there a better way to do this?

(Zhuolun(Jolyon) Li) #13

OMG, thank u so much! I wasted tons of time on understanding so-called gradient on the non-scalar output and you make it clear using one word: “loss” 