How to manually do chain rule backprop?

Assume we have y as a function of x, y = f(x), and z as a function of y, z = g(y).
How can I compute the gradient w.r.t. y first (dz/dy, use this to do something else), and then compute the gradients w.r.t. x (dy/dx)?

Any ideas would be helpful, thanks!


is this approximately what you need: let’s say you have

from torch.autograd import Variable

x = Variable(torch.randn(4), requires_grad=True)
y = f(x)

y2 = Variable(, requires_grad=True) # use to construct new variable to separate the graphs
z = g(y2)

(there also is Variable.detach, but not now)

Then you can do (assuming z is a scalar)

z.backward() # this computes dz/dy2 in y2.grad
y.backward(y2.grad) # this computes dy/dx  * y2.grad
print (x.grad)

Note that the .backward evaluates the derivative at the last forward computation.
(I hope this is correct, I don’t have access to my pytorch right now.)

Best regards



This seems a proper way to do. I’ll try it.