Collecting gradients for multiple losses?

I am trying to do something like this (simplified version of my code):

for x in range(1,1000):
  output = model(data)
#Change Data
  loss = loss + F.nll_loss(output, target)

# Calculate gradients of model in backward pass

# Collect gradients
final_result = final_result +

The problem is that a significant number of temporary variables are causing me to run out of GPU memory. Hence, is this next piece of code logically equivalent?

for x in range(1,1000):
  output = model(data)
  loss = F.nll_loss(output, target)
#Change data
# Calculate gradients of model in backward pass

# Collect gradients
  final_result = final_result +
  del loss
  del other_variables

If I am understand how .backward and works correctly, then it should be equivalent. However, this is not the case for me and I’m currently looking for the bug.

The addition of final_result = final_result + won’t work, if you don’t zero out the gradients in each iteration.
Currently you are accumulating:

final_result = (grad0) + (grad0+grad1) + (grad0+grad1+grad2) + ...

since loss.backward will already accumulate the gradients.
Another approach would be to let loss.backward() accumulate the gradients automatically and just assigning final_result after the loop.

# 1
model = nn.Linear(1, 2, bias=False)
loss = 0.
for _ in range(1000):
    x = torch.randn(1, 1)
    target = torch.randint(0, 2, (1,))
    output = model(x)
    loss = loss + F.nll_loss(output, target)
final_grad1 = model.weight.grad

# 2
model = nn.Linear(1, 2, bias=False)
for _ in range(1000):
    x = torch.randn(1, 1)
    target = torch.randint(0, 2, (1,))
    output = model(x)
    loss = F.nll_loss(output, target)
final_grad2 = model.weight.grad

print(torch.allclose(final_grad1, final_grad2))
> True
1 Like

Is this still true that the loss.backward was accumulating the gradients despite me doing “del loss” during each loop?

The deletion of the loss shouldn’t make a difference, as the gradients were already accumulated.

1 Like

Sorry, One thing I forgot to add: I am doing model.zero_grad() after each iteration. Does this zero out the

model.zero_grad will zero out the gradients of all internal parameters.
If you’ve registered self.myvar = nn.Parameter(...), it should be also zeroed out.

1 Like