Sorry if this is obvious, but I find the description of the torch.autograd.backward(variables, grad_variables, retain_variables=False) function quite confusing.

I’m working on a project where I have a vector of variables that I would like to differentiate to find the Jacobian. When it comes to implementing this, I’m not sure what form grad_variables should be or what a ‘sequence of Tensor’ is. I’ve tried many things, but all throw an error.

Would anyone be able to point me in the direction of an example if one exists? If not, say I had the following super simple example:

x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True) M = Variable(torch.FloatTensor([[1,2],[3,4]])) y = torch.mm(x,M)

What should the arguments for y.backward() be so that I can find [[dy1/dx1, dy1/dx2], [dy2/dx1, dy2/dx2]] (i.e. recover M)?

The naming of grad_variables might be a little bit confusing. In the context of neural networks, it’s the “loss”.

I am a bit confused by this. In this example, isn’t that the “loss” is y?

torch.FloatTensor([[1, 0]]) is passed to grad_variables and specifies that we’ll use the first column of y’s gradients, and torch.FloatTensor([[0, 1]]) for the second column.

By default, grad_variables is torch.Tensor([1]) which means we will just keep calculated gradients.

Thanks for your intuitive example.
I’m getting a bit lost on the usage of the following line: y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
especially when it comes to the gradient argument.

As @linlin already mentioned, in the provided example the argument gradient=torch.FloatTensor([[1,0]])
is used to acquire information for a column (e.g. the first one in the above scenario) or to define the variable
that we compute the gradient with respect to (as seen from: http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.backward) ?

If we use it with respect to a given variable why y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True) jacobian[:,0] = x.grad.data x.grad.data.zero_() y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True) jacobian[:,1] = x.grad.data

Thanks for the explanation @colesbury.
I just want to know what if my y is an output of the inceptionv3 model. In that case, I have to run a loop (1000 iterations). Is there a better way to do this?