Yes, you should not zero out the gradients without executing the optimizer.step() method as you would lose this backward pass.
This post explains different approaches for gradient accumulation.
2 Likes
Yes, you should not zero out the gradients without executing the optimizer.step() method as you would lose this backward pass.
This post explains different approaches for gradient accumulation.