This is certainly not true if you specify
retain_graph=True, and in some simple cases, it seems to be possible to backpropagate multiple times even without specifying
retain_graph=True (but I don’t understand why). Also, the docks for
backward say about
But I am not sure if this is really true. In architectures I have worked with I have often had to specify
retain_graph=True, and if there are more efficient ways of doing what I needed to do, I couldn’t find them. (Is there some explanation somewhere of what these more efficient workarounds are and in what cases they work and in what apparently rare cases they fail?)
For instance, two cases I have encountered are when you have two different loss functions, used to update different parameters, but calculated using some of the same graph, and when you have an RNN and want to do backpropagation through time with overlapping backprop regions (like backprop 512 steps and then 256 steps later backprop another 512 steps).