Gradient Accumulation in Detectron2

ptrblck · August 17, 2022, 7:26am

Yes, you should not zero out the gradients without executing the optimizer.step() method as you would lose this backward pass.
This post explains different approaches for gradient accumulation.