When training nn models using Pytorch, is there a difference regarding where we place the backward method? For example, which one of below is correct?
Calculate gradient across the batch:
for e in range(epochs):
for i in batches_list:
out = nn_model(i)
loss = loss_function(out, actual)
loss_sum += loss.item()
lstm.zero_grad()
loss.backward()
optimizer.step()
loss_list.append(loss_sum / num_train_obs)
Calculate gradient across the epoch:
for e in range(epochs):
for i in batches_list:
out = nn_model(i)
loss = loss_function(out, actual)
loss_sum += loss.item()
lstm.zero_grad()
loss_sum.backward()
optimizer.step()
loss_list.append(loss_sum / num_train_obs)
Thanks @crowsonkb . I think I had a typo in the second code. The backward() should be based on loss_sum, not loss. In this case, would your opinion still hold?
To accumulate gradients across an entire epoch, try something along the lines of the following code:
for e in range(epochs):
optimizer.zero_grad()
for i in batches_list:
out = nn_model(i)
loss = loss_function(out, actual)
loss.backward()
loss_sum += loss.item()
optimizer.step()
loss_list.append(loss_sum / num_train_obs)
I think your second code will need loss_sum += loss instead of loss.item() to work at all? I think mine may be more efficient, allowing the intermediate computational graph for each batch to be freed at the end of each batch, by the loss.backward() call. Though someone with more experience with autograd might want to chime in here.