This post gives you a few examples of gradient accumulation approaches and their advantages or shortcomings.
Based on your current code snippet it seems you would like to use torch.autograd,grad
instead, which I think should work in a similar manner (you might need to perform the gradient accumulation manually though).
2 Likes