Grad returns None or throws error

kuzand · March 16, 2020, 3:10pm

I have a simple input tensor x and a function f(x) = 2*x. The gradient of x is df/dx = 2. How to get this result using pytorch? This is what I tried:

x = torch.tensor([0.0, 1.0], requires_grad = True)
f = 2*x
f.backward()
print(x.grad)

This throws the following error:

RuntimeError: grad can be implicitly created only for scalar outputs

Then I tried:

x = torch.tensor([0.0, 1.0], requires_grad = True)
f = 2*x
f.backward(torch.FloatTensor([1, 1]))
print(x.grad)

This works, returning the following result: tensor([2., 2.]). I am still not sure though why f.backward() doesn’t work and what is the explanation of torch.FloatTensor([1, 1]). Ideally I would expect to get 2. and not [2., 2.].

However, the following returns None:

x = torch.tensor([0.0, 1.0], requires_grad = True)
y = 1*x
f = 2*y
f.backward(torch.FloatTensor([1, 1]))
print(y.grad)

Why is this happening?

albanD · March 16, 2020, 4:24pm

Hi,

Few things:

The .grad field is only populated for leafs (tensor with no history) that require gradients when you call .backward(). In particular that is why you y at the end does not have its .grad field populated: it is not a leaf. If you want its field to be populated, you can call y.retain_grad() before the backward call.
Pytorch uses AD. Which, for multi-dimensional functions, only computes vector Jacobian product. In your first example, you function is R^2 ->R^2 (EDIT: 2D -> R2). So the Jacobian is a 2x2 matrix. You thus need to provide the vector by which it should be multiplied. Since your Jacobian here is diagonal, you can give [1., 1.] as you did to get that diagonal.
For more mathematical computations like this, I would recommend using the autograd.grad API as it is simpler for higher level derivatives and makes it easier to know exactely what you’re doing:

x = torch.tensor([0.0, 1.0], requires_grad = True)
f = 2*x
grad = autograd.grd(outputs=f, inputs=x, grad_outputs=torch.tensor([1.0, 1.0]))[0]
print(grad)

kuzand · March 16, 2020, 7:00pm

Hi albanD. Thank you for your reply.

Isn’t the f = 2*x a 1D tensor, since x is 1D? The Jacobian is then also 1D, i.e. J = tensor([2, 2]). So what is returned is J*grad_outputs where grad_outputs=torch.tensor([1.0, 1.0]), i.e. tensor([2., 2.]).
For a 2D case, e.g. J being a 2x2 tensor, we would have torch.mm(J, grad_outputs) (instead of simple multiplication) where grad_outputs=torch.tensor([[1.0, 1.0], [1.0, 1.0]]). Am I right?

albanD · March 16, 2020, 7:15pm

Sorry my comment was a bit misleading. It’s not 2D it’s 1D of size 2. So in R^2.
In general, if your function is R^n -> R^m, the Jacobian will be 2D of size mxn.