I want to use gradient accumulation like below:
for step in train_loader:
if step % accum==0:
I don’t call optimizer.zero_grad() again. What will be? Since my training looks very well.
The gradients will be accumulated for all batches in your
train_loader, while the optimizer performs a step after
Usually you would zero out the gradients after calling
It’s interesting to hear it’s working. Did you compare it with other approaches (e.g. zeroing out the gradients after