Why .backward() is giving different gradients as compared to analytical computation when retain_graph = True?

aknirala · November 12, 2021, 1:59am

Here is my code and reasoning, kindly let me know if my calculation is incorrect.

#Consider Y = 3X; (We take X = 1 and Y = 4 and perurb X by alpha = 0.001 to get new X)
X = torch.ones(1).requires_grad_()
Y1 = torch.ones(1).requires_grad_()*4
Y2 = torch.ones(1).requires_grad_()*4
p_y1 = 3*X
l1 = F.mse_loss(p_y1, Y1)
grad = torch.autograd.grad(l1, [X], retain_graph=True)[0]
print("grad1:", grad)
nX = X + grad.data*0.001
p_y2 = 3*nX
final_y = (p_y1 + p_y2)/2
print("Final_y:", final_y)
l2 = F.mse_loss(final_y, Y2)
grad = torch.autograd.grad(l2, [X])[0]
print("Loss: ",l2, "Final grad: ", grad)

This gives output:

grad1: tensor([-6.])
Final_y: tensor([2.9910], grad_fn=<DivBackward0>)
Loss:  tensor(1.0181, grad_fn=<MseLossBackward0>) Final grad:  tensor([-6.0540])

Now, I try to compute this analytically as:

#loss = (y - 3x)^2 ==> grad_x = 2(y-3x)(-3) = -6(y - 3x)
y = 4
x = 1
grad1 = -6*(y-3*x)
grad1

This gives output as:
-6
This is as expected.

"""
Now, x get's modified as: x2 = x + 0.001*grad1
we get new Y as: 3*( x + 0.001*grad1) = 3*(x -0.006* (y - 3x) ) = 3x -0.018y +0.054x
And this is averaged with old y: 3*x to get final output: (3x + 3x -0.018y +0.054x)/2 = 3x +0.027x - 0.009y
"""
final_p = 3*x +0.027*x - 0.009*y
final_p #This is final y, so far so good

This gives output as:
2.991
This is also as expected.

"""
Now, wrt this what will be grad X?
loss = (y - 3x -0.027x + 0.009y)^2 #Note here two y's have same value...so...
grad_x = 2(y - 3x -0.027x + 0.009y) (-3 - 0.027)
"""
(y - 3*x -0.027*x + 0.009*y)**2 #This is loss also seems reasonable

This gives output as:
1.0180809999999998
This is also as expected (more or less)

2*(y - 3*x -0.027*x + 0.009*y)*(-3 - 0.027) #This is final grad on X, WHY THE DIFF?
This gives output as:
-6.108486
This is different Why, pyTorch calculated -6.0540?

EDIT: It seems that pyTorch is considering first addition to X as constant!! as evident from code below. How can I make it treat as a function of X?

X = torch.ones(1).requires_grad_()
Y1 = torch.ones(1).requires_grad_()*4
Y2 = torch.ones(1).requires_grad_()*4
p_y1 = 3*X
"""l1 = F.mse_loss(p_y1, Y1)
#l1.backward(retain_graph=True)
grad = torch.autograd.grad(l1, [X], retain_graph=True)[0]
print("grad1:", grad)
delX = grad.data*0.001"""
delX = torch.autograd.Variable(torch.ones(1)*-0.006)
nX = X + delX.detach().clone()
p_y2 = 3*nX
final_y = (p_y1 + p_y2)/2
print("Final_y:", final_y)
l2 = F.mse_loss(final_y, Y2)
grad = torch.autograd.grad(l2, [X])[0]
#l2.backward()
print("Loss: ",l2, "Final grad: ", grad)

This gives the same output as before:

Final_y: tensor([2.9910], grad_fn=<DivBackward0>)
Loss:  tensor(1.0181, grad_fn=<MseLossBackward0>) Final grad:  tensor([-6.0540])

This matches my analytical calculations as well, but how can I get the actual gradient using pyTorch?