Mini batch by accumulating gradients in pytorch

Hi, All.

I got a question about mini-batch. Say I have sentences with variable lengths within one mini-batch, I want to update the parameters until I feed the whole mini-batch. The code may look like this.

 model.zero_grad()
 for sample in batch:
     loss = model(sample)
     loss.backtrack()
  optim.step()

I check from the doc that the gradients of leaf nodes are accumulated. I wonder if the gradients of non-leaf variables are accumulated in this case.

I know there are options like padding or sorting the samples, but I’m still curious if it’s appropriate to do it in this way.

Thanks

2 Likes

Hi,
Yes this will work as you expect.
BTW it’s backward, not backtrack.

2 Likes