If I didn’t misunderstand you, for backpropagation purposes the graph is viewed with the output (loss) as its root, and the input layer as composed by leaf nodes.
The grad with respect to intermediate nodes has to be computed (otherwise one cannot leverage the chain rule), but once the beckward pass finishes, you just need the gradient with respect to leaf nodes in order to update the weights. So it doesn’t save intermediate steps.