Why .backward() is giving different gradients as compared to analytical computation when retain_graph = True?

Here is my code and reasoning, kindly let me know if my calculation is incorrect.

``````#Consider Y = 3X; (We take X = 1 and Y = 4 and perurb X by alpha = 0.001 to get new X)
p_y1 = 3*X
l1 = F.mse_loss(p_y1, Y1)
p_y2 = 3*nX
final_y = (p_y1 + p_y2)/2
print("Final_y:", final_y)
l2 = F.mse_loss(final_y, Y2)
``````

This gives output:

``````grad1: tensor([-6.])
``````

Now, I try to compute this analytically as:

``````#loss = (y - 3x)^2 ==> grad_x = 2(y-3x)(-3) = -6(y - 3x)
y = 4
x = 1
``````

This gives output as:
`-6`
This is as expected.

``````"""
Now, x get's modified as: x2 = x + 0.001*grad1
we get new Y as: 3*( x + 0.001*grad1) = 3*(x -0.006* (y - 3x) ) = 3x -0.018y +0.054x
And this is averaged with old y: 3*x to get final output: (3x + 3x -0.018y +0.054x)/2 = 3x +0.027x - 0.009y
"""
final_p = 3*x +0.027*x - 0.009*y
final_p #This is final y, so far so good
``````

This gives output as:
`2.991`
This is also as expected.

``````"""
Now, wrt this what will be grad X?
loss = (y - 3x -0.027x + 0.009y)^2 #Note here two y's have same value...so...
grad_x = 2(y - 3x -0.027x + 0.009y) (-3 - 0.027)
"""
(y - 3*x -0.027*x + 0.009*y)**2 #This is loss also seems reasonable
``````

This gives output as:
`1.0180809999999998`
This is also as expected (more or less)

`2*(y - 3*x -0.027*x + 0.009*y)*(-3 - 0.027) #This is final grad on X, WHY THE DIFF?`
This gives output as:
`-6.108486`
This is different Why, pyTorch calculated -6.0540?

EDIT: It seems that pyTorch is considering first addition to X as constant!! as evident from code below. How can I make it treat as a function of X?

``````X = torch.ones(1).requires_grad_()
p_y1 = 3*X
"""l1 = F.mse_loss(p_y1, Y1)
#l1.backward(retain_graph=True)
nX = X + delX.detach().clone()
p_y2 = 3*nX
final_y = (p_y1 + p_y2)/2
print("Final_y:", final_y)
l2 = F.mse_loss(final_y, Y2)
``````Final_y: tensor([2.9910], grad_fn=<DivBackward0>)