You should replace the following:
(you should not give the same Tensor to different parameters. otherwise they have the same memory and so will always have the same value. And if you do an update on one then the other, then you do 2x the update)
Thank you @albanD. This worked for me. Everything is fine until first epoch. I have problem in the second epoch though.
Following the question I asked yesterday and you answered:
my model has two sets of parameters a and b which require gradients. I have two different loss functions as:
loss1 = function(a, b)
loss2 = function(b)
and the total loss is
total_loss = loss1 + loss2
Assuming:
loss1 calculates b1.grad
loss2 calculates b2.grad,
then the total b.grad calculated by total_loss.backward() is b.grad = b1.grad + b2.grad.
My goal is to modify b1.grad coming from loss1 and b2.grad comig from loss2 and then add them together as backward gradients for b. Currently when I do total_loss.bakward(), it already gives me the accumulated gradients for b.grad=b1.grad + b2.grad. How can I access and modify each individual b1.grad and b2.grad so the total_loss.backward() returns the modified and added b1.grad and b2.grad?
You told me to make intermediate gradients which I did in this code sample. However, I’m not sure if
centers_grad = 0
for param in criterion.parameters():
centers_grad += param.grad
# update centers grad
for param in criterion.parameters():
param.grad = centers_grad
print(f'c.grad = {param.grad}')
is doing the job of b.grad = b1.grad + b2.grad. It is ok for first epoch but doing the same thing in second epoch already adds the gradients for me. It seems like the centers_grad variable gets attached to something. Do you have any idea how to fix this?
Given that you modify .grad with things that requires_grad=True, I guess the next backward keep the graph from the previous iterations? Can you try setting all the .grad to None at the end of your iteration (instead of calling .zero_grad().