PyTorch Gradients

Hi,

When you perform loss.backward() the gradients are accumulated inplace in each Variable that requires gradient.
That is why you need to perform optimizer.zero_grad() before each backward.

If you want to accumulate gradients from multiple backwards, you can just backward multiple times without resetting the gradients:

optimizer.zero_grad()

for i in range(minibatch):
    loss = model(batch_data[i])
    loss.backward()

optimizer.step()
9 Likes