Gradient Accumulation in Detectron2

Yes, you should not zero out the gradients without executing the optimizer.step() method as you would lose this backward pass.
This post explains different approaches for gradient accumulation.

2 Likes