Should backward() function be in the loop of epoch or batch?

whj123 · December 26, 2019, 6:08pm

When training nn models using Pytorch, is there a difference regarding where we place the backward method? For example, which one of below is correct?

Calculate gradient across the batch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
        lstm.zero_grad()
        loss.backward()
        optimizer.step()
loss_list.append(loss_sum / num_train_obs)

Calculate gradient across the epoch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
    lstm.zero_grad()
    loss_sum.backward()
    optimizer.step()     
loss_list.append(loss_sum / num_train_obs)

crowsonkb · December 26, 2019, 6:33pm

The former is correct. The latter will compute the gradient for the last minibatch only.

whj123 · December 26, 2019, 6:35pm

Thanks @crowsonkb . I think I had a typo in the second code. The backward() should be based on loss_sum, not loss. In this case, would your opinion still hold?

crowsonkb · December 26, 2019, 6:50pm

To accumulate gradients across an entire epoch, try something along the lines of the following code:

for e in range(epochs):
    optimizer.zero_grad()
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss.backward()
        loss_sum += loss.item()
    optimizer.step()     
    loss_list.append(loss_sum / num_train_obs)

whj123 · December 26, 2019, 8:31pm

I guess your pseudocode and my second code are both correct in terms of accumulating gradients across the entire epoch? @crowsonkb

crowsonkb · December 26, 2019, 11:06pm

I think your second code will need loss_sum += loss instead of loss.item() to work at all? I think mine may be more efficient, allowing the intermediate computational graph for each batch to be freed at the end of each batch, by the loss.backward() call. Though someone with more experience with autograd might want to chime in here.

ptrblck · December 27, 2019, 4:09am

Your explanation is correct and @albanD posted some examples in this post a while ago.