How gradients are accumulated in real

linzhu1206 · March 20, 2021, 10:14pm

‘Gradient will not be updated but be accumulated, and updated every N rounds.’ I have a question that how the gradients are accumulated in the below code snippet: in every round of the below loop I can see a new gradient is computed by loss.backward() and should be stored internally, but would this internally stored gradient be refreshed in the next round? How the gradient is summed up, and later be applied every N rounds?

for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step
        model.zero_grad()

crcrpar · March 21, 2021, 12:06am

Hi,

gradients are accumulated until model.zero_grad() is called.

A toy is notebook is here: how-gradients-are-accumulated-in-real.ipynb · GitHub

linzhu1206 · March 21, 2021, 5:44am

Thanks a lot for the clarification and the code! Before coming to this gradient accumulation topic, it`s easy for beginner like me to feel that loss.backward ‘sets’ the grad but not ‘accumulate’.