Back propagation multiple times before optimizer.step()

shirui-japina · October 20, 2019, 6:05pm

What will happen if back propagation multiple times before optimizer.step()?

Coincidentally, I got RuntimeError like this:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

Which means I can back propagation multiple times before optimizer.step(). But what will happen if I do that?

Under what circumstances should I use like this?

tom · October 20, 2019, 6:50pm

This is useful in specific cases, e.g. when using torch.autograd.grad (with create_graph=True) to incorporate derivatives in your loss function.
In general, backward accumulates gradients (that’s why you do zero_grad to clear them). Backpropagating the same graph just for the sake of it would be unusual though. It usually is more efficient to only backpropagate once per forward unless you actually have one of those cases where you need the gradient in your loss function or similar.

Best regards

Thomas