Will this procedure cause double gradient?

Hello I am trying to incorporate multiple losses into a multi-model structure. My current workflow is as following.

optimizer = optim.SGD([{'params': model_1.parameters(), 'lr': 0.1},
                       {'params': model_2.parameters(), 'lr': 0.01},
                       {'params': criterion_1.parameters(), 'lr': 0.01}])

output_1, loss_1 = model_1(input)
output_2, loss_2 = model_2(output_1)
loss_3 = criterion_1(output_1, labels)
loss_4 = criterion_2(output_2, labels)

loss = loss_1 + loss_2  + loss_3 + loss_4



I am worried if it will cause double gradient, because the backward() will go through loss_1, and loss_3, both, kinda, come from model_1. Similar as loss_2 and loss_4, while criterion_2 does not have parameters to be optimized.

I am also wondering what double gradient can cause if happened.

Thank you very much.


I am not sure what you call “double gradient”.
But in this case what will happen is that it will backprop both criterions, then model_2 and once you have all gradients for model_1’s output it will backprop model_1.

Thank you for your reply! I was wondering if in this case, different sets of gradient of model_1 would be used for optimizations. Because the backpropagation seems to come from both loss_1 and output_1 which is the input of model_2 that calculates loss_2 and input of criterion_1 that calculates loss_3. I am not sure if this will cause problem.

The gradients will correspond to the final loss value. Since you sum everything up here, gradients will just be the sum of the ones from every branch.