Hi, the following code outputs 10, but I do not understand how the code arrives at this result. Would anyone be able to explain what the logic is for this result?

import torch
from torch.autograd import Variable
val = torch.FloatTensor([1])
x = Variable(val,requires_grad=True)
y = x * 2
z = y ** 2
torch.autograd.backward([z],[val],retain_variables=True)
torch.autograd.backward([y],[val],retain_variables=True)
g = x.grad
print(g)#> Outputs : 10

Because if you use retain_variables=True, the gradient buffer will hold and accumulate the gradient history. dz|dx = 8, dy|dx = 2, so you get 8+2 = 10.

I guess the gradient of any Variable, such as x, will be stored on a separate buffer, and when we use backward(), the system will dump each gradient to the buffer which it belongs. So dz/dx and dy/dx will be put into the same buffer of the Variable x.

I suppose it has nothing to do with chain rule here. The chain rule is dz|dx = dz|dy * dy|dx, but here are dz|dx and dy|dx. What your code have done is ‘calculate the gradient of current graph from node z, and keep the gradient. Then calculate the gradient of current graph from node y, and print the gradient buffer of node x’.