How to get the gradients for both the input and intermediate variables?

oat · December 10, 2020, 6:20am

Assuming the following forward pass: x → y → z, in which
x is a scalar
y = x * x
z = 2 * y

I want to track the following derivatives/gradients, including the gradient for the intermediate function y:

dz/dy, gradient of z with respect to y, which should be “2”
dy/dx, gradient of y with respect to x, which should be “2x”
dz/dx, gradient of z with respect to x, which should be “4x”

So, I initiated both x and y with the requires_grad=True argument. However, I can only get y.grad, which is “2”, and x.grad returned “None”, as shown below.

May I ask:

Why I’m unable to get x.grad (dz/dx) in this case?
How to get the gradients for both the input and intermediate variables via .backward()?

ptrblck · December 10, 2020, 6:40am

You are detaching the computation graph by recreating a tensor in:

 y= torch.tensor(x * x)

use y = x*x instead.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier

oat · December 10, 2020, 6:56am

Thanks, @ptrblck.

However, it seems y.grad is not tracked using y = x * x:

x = torch.tensor(0.3, requires_grad=True)
print(x)
# [output] tensor(0.3000, requires_grad=True)

y = x * x
print(y)
# [output] tensor(0.0900, grad_fn=<MulBackward0>)

z = 2 * y
print(z)
# [output] tensor(0.1800, grad_fn=<MulBackward0>)

z.backward()

print(y.grad)
# [output] None

print(x.grad)
# [output] tensor(1.2000)

Anyway, I found this post that’s relevant to my question, and I’ll digest it first.

ptrblck · December 10, 2020, 6:58am

Your code should raise a warning and explain the reason and workaround for it:

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  print(y.grad)

After adding y.retain_grad() you’ll get the gradient value for y.

oat · December 10, 2020, 7:05am

Thanks, @ptrblck.

I’m doing the test in a Jupyter notebook via vscode. I’m not sure why I don’t get the warning message you pointed out.

Anyway, adding y.retain_grad() does work in this case as you suggested:

x = torch.tensor(0.3, requires_grad=True)
print(x)
# [output] tensor(0.3000, requires_grad=True)

y = x * x
print(y)
# [output] tensor(0.0900, grad_fn=<MulBackward0>)

y.retain_grad()

z = 2 * y
print(z)
# [output] tensor(0.1800, grad_fn=<MulBackward0>)

z.backward()

print(y.grad)
# [output] tensor(2.)

print(x.grad)
# [output] tensor(1.2000)