How gradients are accumulated in real

‘Gradient will not be updated but be accumulated, and updated every N rounds.’ I have a question that how the gradients are accumulated in the below code snippet: in every round of the below loop I can see a new gradient is computed by loss.backward() and should be stored internally, but would this internally stored gradient be refreshed in the next round? How the gradient is summed up, and later be applied every N rounds?

for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step


gradients are accumulated until model.zero_grad() is called.

A toy is notebook is here: how-gradients-are-accumulated-in-real.ipynb · GitHub

1 Like

Thanks a lot for the clarification and the code! Before coming to this gradient accumulation topic, it`s easy for beginner like me to feel that loss.backward ‘sets’ the grad but not ‘accumulate’.

1 Like