Will this procedure cause double gradient?

Zhongqi_Miao · October 23, 2018, 9:04pm

Hello I am trying to incorporate multiple losses into a multi-model structure. My current workflow is as following.

optimizer = optim.SGD([{'params': model_1.parameters(), 'lr': 0.1},
                       {'params': model_2.parameters(), 'lr': 0.01},
                       {'params': criterion_1.parameters(), 'lr': 0.01}])

output_1, loss_1 = model_1(input)
output_2, loss_2 = model_2(output_1)
loss_3 = criterion_1(output_1, labels)
loss_4 = criterion_2(output_2, labels)

loss = loss_1 + loss_2  + loss_3 + loss_4

loss.backward()

optimizer.step()

I am worried if it will cause double gradient, because the backward() will go through loss_1, and loss_3, both, kinda, come from model_1. Similar as loss_2 and loss_4, while criterion_2 does not have parameters to be optimized.

I am also wondering what double gradient can cause if happened.

Thank you very much.

albanD · October 24, 2018, 12:47pm

Hi,

I am not sure what you call “double gradient”.
But in this case what will happen is that it will backprop both criterions, then model_2 and once you have all gradients for model_1’s output it will backprop model_1.

Zhongqi_Miao · October 24, 2018, 3:58pm

Thank you for your reply! I was wondering if in this case, different sets of gradient of model_1 would be used for optimizations. Because the backpropagation seems to come from both loss_1 and output_1 which is the input of model_2 that calculates loss_2 and input of criterion_1 that calculates loss_3. I am not sure if this will cause problem.

albanD · October 24, 2018, 5:31pm

The gradients will correspond to the final loss value. Since you sum everything up here, gradients will just be the sum of the ones from every branch.