Optimizer keeps updating tensors from previous batch

Here’s a slightly simplified version of my problem: let’s say I have 100 parameters that I pass to an optimizer to optimize. In a loop I then load a batch of 4 new images, pass them through some CNNs, compute loss, and backpropagate the loss to the 100 parameters (the parameters are used in one of the functions in the CNN). The loss for each image depends one of these parameters, so after each batch has been processed the optimizer should update 4 of these parameters. This works fine for the first batch (where 4 parameters get updated), but after the second batch has been processed the optimizer updates 8 parameters instead of just 4 – it updates 4 params corresponding to the images from the current batch but also 4 params corresponding to images from the previous batch. This keeps repeating and after each new batch of images more and more parameters are being updated (instead of just the current 4 that I want).

My training loop looks something like

parameters = [torch.Tensor([1.]), torch.Tensor([2.]), torch.Tensor([3.])]
optimizer = torch.optim.Adam(parameters, lr)
for image_batch in enumerate(image_loader):
    loss = calculate_loss(image_batch, parameters)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

I have checked the code and I’m zeroing the gradients before each optimizer update with optimizer.zero_grad() so the gradients shouldn’t be accumulating. Also, I have checked the gradients of the parameters after calling loss.backward() and the only parameters with non-zero gradients are just those 4 that I want to update, so that seems ok as well. What is confusing is that the parameters that get updated in addition to these 4 have had zero gradients after loss.backward() but their value changed nevertheless after calling optimizer.step(). Does anybody know what’s going on here? Thanks!

Hi,

That looks very similar to The .grad is zero, the value change no?

Yes I have read that post but still don’t know what’s causing the problem here. Is there any way to make it not update the parameters from the previous batch? I thought that since the previous batch params haven’t been used to calculate the current loss they shouldn’t be updated.

Update: to be more specific, I still want to use Adam or SGD with momentum, I just don’t want the parameters to update if they haven’t been used to calculate the loss for the current batch.

But then what should Adam or SGD do with these steps? Not update the momentums? In that case you would get wrong momentum terms since they would work with different set of parameters all the time?

Probably a better solution in this case would be to create a different optimizer for each of the parameters since I want to update them separately (i.e. I didn’t want to update all parameters during every execution of the training loop)