Clarification - Using backward() on non-scalars

colesbury · March 14, 2017, 8:05pm

The naming of grad_variables might be a little bit confusing. In the context of neural networks, it’s the “loss”.

To recover M requires two calls to backwards. Here’s how with Variable.backward():

x = Variable(torch.FloatTensor([[2,1]]), requires_grad=True)
M = Variable(torch.FloatTensor([[1,2],[3,4]]))
y = torch.mm(x, M)
jacobian = torch.FloatTensor(2, 2).zero_()
y.backward(torch.FloatTensor([[1, 0]]), retain_variables=True)
jacobian[:,0] = x.grad.data
x.grad.data.zero_()
y.backward(torch.FloatTensor([[0, 1]]), retain_variables=True)
jacobian[:,1] = x.grad.data

You can also replace the y.backward() calls are equivalent to:

torch.autograd.backward([y], [torch.FloatTensor([[1, 0]])], retain_variables=True)