Hi,
When you perform loss.backward()
the gradients are accumulated inplace in each Variable that requires gradient.
That is why you need to perform optimizer.zero_grad()
before each backward.
If you want to accumulate gradients from multiple backwards, you can just backward multiple times without resetting the gradients:
optimizer.zero_grad()
for i in range(minibatch):
loss = model(batch_data[i])
loss.backward()
optimizer.step()