Do not call optimizer.zero_grad()

I want to use gradient accumulation like below:

accum=8
optimizer.zero_grad():
for step in train_loader:
    out=model(input)
    loss=criterion(out,target)
    loss.backwad()
   if step % accum==0:
        optimizer.step()

I don’t call optimizer.zero_grad() again. What will be? Since my training looks very well.

The gradients will be accumulated for all batches in your train_loader, while the optimizer performs a step after accum steps.
Usually you would zero out the gradients after calling optimizer.step().
It’s interesting to hear it’s working. Did you compare it with other approaches (e.g. zeroing out the gradients after optimizer.step())?