Multiple calls to "backward" with "requires_grad=True"

woofie56 · April 6, 2017, 8:22am

Hi, the following code outputs 10, but I do not understand how the code arrives at this result. Would anyone be able to explain what the logic is for this result?

import torch
from torch.autograd import Variable

val = torch.FloatTensor([1])
x = Variable(val,requires_grad=True)
y = x * 2
z = y ** 2

torch.autograd.backward([z],[val],retain_variables=True)
torch.autograd.backward([y],[val],retain_variables=True)

g = x.grad
print(g)#> Outputs : 10

cyyyyc123 · April 6, 2017, 9:12am

Because if you use retain_variables=True, the gradient buffer will hold and accumulate the gradient history. dz|dx = 8, dy|dx = 2, so you get 8+2 = 10.

woofie56 · April 6, 2017, 9:54am

Hi thanks. But why would dz/dx be added to dy/dx?

cyyyyc123 · April 6, 2017, 10:01am

I guess the gradient of any Variable, such as x, will be stored on a separate buffer, and when we use backward(), the system will dump each gradient to the buffer which it belongs. So dz/dx and dy/dx will be put into the same buffer of the Variable x.

woofie56 · April 6, 2017, 10:12am

But shouldn’t they be multiplied (rather than added), since we are using the chain rule? Thanks

cyyyyc123 · April 6, 2017, 10:32am

I suppose it has nothing to do with chain rule here. The chain rule is dz|dx = dz|dy * dy|dx, but here are dz|dx and dy|dx. What your code have done is ‘calculate the gradient of current graph from node z, and keep the gradient. Then calculate the gradient of current graph from node y, and print the gradient buffer of node x’.

woofie56 · April 6, 2017, 10:55am

OK thanks I am clear now. For some reason I was assuming (wrong assumption) that a double application of

torch.autograd.backward([z],[val],retain_variables=True)

would calculate the second derivative which would require a multiplication, and hence was inferring that the same would apply to :

torch.autograd.backward([z],[val],retain_variables=True)
torch.autograd.backward([y],[val],retain_variables=True)

Now I am clear. Thanks