I am trying to do something like this (simplified version of my code):

for x in range(1,1000):
output = model(data)
#Change Data
loss = loss + F.nll_loss(output, target)
# Calculate gradients of model in backward pass
loss.backward()
# Collect gradients
final_result = final_result + myvar.grad.data

The problem is that a significant number of temporary variables are causing me to run out of GPU memory. Hence, is this next piece of code logically equivalent?

for x in range(1,1000):
output = model(data)
loss = F.nll_loss(output, target)
#Change data
# Calculate gradients of model in backward pass
loss.backward(retain_graph=True)
# Collect gradients
final_result = final_result + myvar.grad.data
del loss
del other_variables

If I am understand how .backward and .grad.data works correctly, then it should be equivalent. However, this is not the case for me and Iâ€™m currently looking for the bug.

The addition of final_result = final_result + myvar.grad.data wonâ€™t work, if you donâ€™t zero out the gradients in each iteration.
Currently you are accumulating:

since loss.backward will already accumulate the gradients.
Another approach would be to let loss.backward() accumulate the gradients automatically and just assigning final_result after the loop.

# 1
torch.manual_seed(2809)
model = nn.Linear(1, 2, bias=False)
loss = 0.
for _ in range(1000):
x = torch.randn(1, 1)
target = torch.randint(0, 2, (1,))
output = model(x)
loss = loss + F.nll_loss(output, target)
loss.backward()
final_grad1 = model.weight.grad
# 2
torch.manual_seed(2809)
model = nn.Linear(1, 2, bias=False)
for _ in range(1000):
x = torch.randn(1, 1)
target = torch.randint(0, 2, (1,))
output = model(x)
loss = F.nll_loss(output, target)
loss.backward()
final_grad2 = model.weight.grad
print(torch.allclose(final_grad1, final_grad2))
> True

model.zero_grad will zero out the gradients of all internal parameters.
If youâ€™ve registered self.myvar = nn.Parameter(...), it should be also zeroed out.