Multiple calls to "backward" with "requires_grad=True"

Hi, the following code outputs 10, but I do not understand how the code arrives at this result. Would anyone be able to explain what the logic is for this result?

import torch
from torch.autograd import Variable

val = torch.FloatTensor([1])
x = Variable(val,requires_grad=True)
y = x * 2
z = y ** 2

torch.autograd.backward([z],[val],retain_variables=True)
torch.autograd.backward([y],[val],retain_variables=True)

g = x.grad
print(g)#> Outputs : 10

Because if you use retain_variables=True, the gradient buffer will hold and accumulate the gradient history. dz|dx = 8, dy|dx = 2, so you get 8+2 = 10.

1 Like

Hi thanks. But why would dz/dx be added to dy/dx?

I guess the gradient of any Variable, such as x, will be stored on a separate buffer, and when we use backward(), the system will dump each gradient to the buffer which it belongs. So dz/dx and dy/dx will be put into the same buffer of the Variable x.

1 Like

But shouldn’t they be multiplied (rather than added), since we are using the chain rule? Thanks

I suppose it has nothing to do with chain rule here. The chain rule is dz|dx = dz|dy * dy|dx, but here are dz|dx and dy|dx. What your code have done is ‘calculate the gradient of current graph from node z, and keep the gradient. Then calculate the gradient of current graph from node y, and print the gradient buffer of node x’.

1 Like

OK thanks I am clear now. For some reason I was assuming (wrong assumption) that a double application of

torch.autograd.backward([z],[val],retain_variables=True)

would calculate the second derivative which would require a multiplication, and hence was inferring that the same would apply to :

torch.autograd.backward([z],[val],retain_variables=True)
torch.autograd.backward([y],[val],retain_variables=True) 

Now I am clear. Thanks

1 Like