Backpropagating through multiple optimizer steps

ptrblck · September 27, 2018, 2:17am

I’m not sure I understand the issue, so please correct me if I’m wrong.
As far as I understand you would like to pass multiple batches through your model, calculate the loss for each, and use the accumulated loss to update the gradients?

If so, you could just do exactly this or alternatively call loss.backward() in each iteration as this will also accumulate the gradients:

model = nn.Linear(10, 2)
criterion = nn.CrossEntropyLoss()
x = torch.randn(1, 10)
target = torch.empty(1, dtype=torch.long).random_(2)
losses = 0
for _ in range(10):
    output = model(x)
    loss = criterion(output, target)
    losses += loss
print(losses)
losses.backward()
print(model.weight.grad)

model.zero_grad()
for _ in range(10):
    output = model(x)
    loss = criterion(output, target)
    loss.backward()
print(model.weight.grad)

Here is also a good explanation of the different approaches.