I’m pretty convinced this will be a silly question, and still I’d ask it.

Look at the example at http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

How the gradient `out` with respect to `x` is calculated is pretty clear and makes sense according to the chain rule.

What I don’t understand is why the other leaves have no gradient. E.g., if you `print(y.grad)` it equals to `None`.

Why’s that? After all, `y` (or `z`) is another variable (in fact, `Variable`) involved in the chain rule…

1 Like

`y` and `z` aren’t leaf nodes because their computation depends on x so their gradients aren’t saved.

1 Like

Thank you, this answers my question.

If I didn’t misunderstand you, for backpropagation purposes the graph is viewed with the output (loss) as its root, and the input layer as composed by leaf nodes.

The grad with respect to intermediate nodes has to be computed (otherwise one cannot leverage the chain rule), but once the beckward pass finishes, you just need the gradient with respect to leaf nodes in order to update the weights. So it doesn’t save intermediate steps.

Am I right?

@Ernst_Stavro_Blofeld yes you are right.

1 Like

Thanks, both of you have been very kind.