How are optimizer.step() and loss.backward() related?

This is certainly not true if you specify retain_graph=True, and in some simple cases, it seems to be possible to backpropagate multiple times even without specifying retain_graph=True (but I don’t understand why). Also, the docks for backward say about retain_graph,

But I am not sure if this is really true. In architectures I have worked with I have often had to specify retain_graph=True, and if there are more efficient ways of doing what I needed to do, I couldn’t find them. (Is there some explanation somewhere of what these more efficient workarounds are and in what cases they work and in what apparently rare cases they fail?)

For instance, two cases I have encountered are when you have two different loss functions, used to update different parameters, but calculated using some of the same graph, and when you have an RNN and want to do backpropagation through time with overlapping backprop regions (like backprop 512 steps and then 256 steps later backprop another 512 steps).

7 Likes