 # Collecting gradients for multiple losses?

I am trying to do something like this (simplified version of my code):

``````for x in range(1,1000):
output = model(data)
#Change Data
loss = loss + F.nll_loss(output, target)

# Calculate gradients of model in backward pass
loss.backward()

# Collect gradients
final_result = final_result + myvar.grad.data
``````

The problem is that a significant number of temporary variables are causing me to run out of GPU memory. Hence, is this next piece of code logically equivalent?

``````for x in range(1,1000):
output = model(data)
loss = F.nll_loss(output, target)
#Change data
# Calculate gradients of model in backward pass
loss.backward(retain_graph=True)

# Collect gradients
final_result = final_result + myvar.grad.data
del loss
del other_variables
``````

If I am understand how .backward and .grad.data works correctly, then it should be equivalent. However, this is not the case for me and I’m currently looking for the bug.

The addition of `final_result = final_result + myvar.grad.data` won’t work, if you don’t zero out the gradients in each iteration.
Currently you are accumulating:

``````final_result = (grad0) + (grad0+grad1) + (grad0+grad1+grad2) + ...
``````

since `loss.backward` will already accumulate the gradients.
Another approach would be to let `loss.backward()` accumulate the gradients automatically and just assigning `final_result` after the loop.

``````
# 1
torch.manual_seed(2809)
model = nn.Linear(1, 2, bias=False)
loss = 0.
for _ in range(1000):
x = torch.randn(1, 1)
target = torch.randint(0, 2, (1,))
output = model(x)
loss = loss + F.nll_loss(output, target)

loss.backward()
final_grad1 = model.weight.grad

# 2
torch.manual_seed(2809)
model = nn.Linear(1, 2, bias=False)
for _ in range(1000):
x = torch.randn(1, 1)
target = torch.randint(0, 2, (1,))
output = model(x)
loss = F.nll_loss(output, target)
loss.backward()

final_grad2 = model.weight.grad

print(torch.allclose(final_grad1, final_grad2))
> True
``````
1 Like

Is this still true that the loss.backward was accumulating the gradients despite me doing “del loss” during each loop?

The deletion of the loss shouldn’t make a difference, as the gradients were already accumulated.

1 Like

Sorry, One thing I forgot to add: I am doing model.zero_grad() after each iteration. Does this zero out the myvar.grad.data?

`model.zero_grad` will zero out the gradients of all internal parameters.
If you’ve registered `self.myvar = nn.Parameter(...)`, it should be also zeroed out.

1 Like