Here’s a slightly simplified version of my problem: let’s say I have 100 parameters that I pass to an optimizer to optimize. In a loop I then load a batch of 4 new images, pass them through some CNNs, compute loss, and backpropagate the loss to the 100 parameters (the parameters are used in one of the functions in the CNN). The loss for each image depends one of these parameters, so after each batch has been processed the optimizer should update 4 of these parameters. This works fine for the first batch (where 4 parameters get updated), but after the second batch has been processed the optimizer updates 8 parameters instead of just 4 – it updates 4 params corresponding to the images from the current batch but also 4 params corresponding to images from the previous batch. This keeps repeating and after each new batch of images more and more parameters are being updated (instead of just the current 4 that I want).
My training loop looks something like
parameters = [torch.Tensor([1.]), torch.Tensor([2.]), torch.Tensor([3.])]
optimizer = torch.optim.Adam(parameters, lr)
for image_batch in enumerate(image_loader):
loss = calculate_loss(image_batch, parameters)
optimizer.zero_grad()
loss.backward()
optimizer.step()
I have checked the code and I’m zeroing the gradients before each optimizer update with optimizer.zero_grad()
so the gradients shouldn’t be accumulating. Also, I have checked the gradients of the parameters after calling loss.backward()
and the only parameters with non-zero gradients are just those 4 that I want to update, so that seems ok as well. What is confusing is that the parameters that get updated in addition to these 4 have had zero gradients after loss.backward()
but their value changed nevertheless after calling optimizer.step()
. Does anybody know what’s going on here? Thanks!