[Beginner] Autograd mechanics

Ernst_Stavro_Blofeld · January 24, 2018, 3:10pm

I’m pretty convinced this will be a silly question, and still I’d ask it.

Look at the example at http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

How the gradient out with respect to x is calculated is pretty clear and makes sense according to the chain rule.

What I don’t understand is why the other leaves have no gradient. E.g., if you print(y.grad) it equals to None.

Why’s that? After all, y (or z) is another variable (in fact, Variable) involved in the chain rule…

richard · January 24, 2018, 3:17pm

y and z aren’t leaf nodes because their computation depends on x so their gradients aren’t saved.

If you want to see their gradients I think you can register a backward() hook: http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.register_hook

Ernst_Stavro_Blofeld · January 24, 2018, 4:41pm

Thank you, this answers my question.

If I didn’t misunderstand you, for backpropagation purposes the graph is viewed with the output (loss) as its root, and the input layer as composed by leaf nodes.

The grad with respect to intermediate nodes has to be computed (otherwise one cannot leverage the chain rule), but once the beckward pass finishes, you just need the gradient with respect to leaf nodes in order to update the weights. So it doesn’t save intermediate steps.

Am I right?

smth · January 24, 2018, 4:53pm

@Ernst_Stavro_Blofeld yes you are right.

Ernst_Stavro_Blofeld · January 24, 2018, 7:57pm

Thanks, both of you have been very kind.