Why do we need to set the gradients manually to zero in pytorch?

@tom, since it is possible to accumulate the loss of several minibatch and do one parameter update. For example I want to update the parameter every 64 minibatch, I have the following code

total_loss = Variable(torch.zeros(1), requires_grad=True)

for idx, (data, target) in train_loader:

    data, target = Variable(data), Variable(target)
    output = model(data)
    loss = criterion(output, target)
    total_loss = total_loss + loss

    if (idx+1)%64 == 0:
        total_loss = total_loss/(64*batchsize)
        total_loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        total_loss = Variable(torch.zeros(1), requires_grad=True)

Is the above code correct to achieve the desired effect?

3 Likes