How to update sequential part of model using different loss

mingzhang96 · March 25, 2019, 2:17pm

Hi everyone,

I have a network made of 2 parts, named part A and part B. And there are two loss, named loss_A and loss_B. The following code shows their relationship.

# input -> A -> B -> output
x = A(input)
output = B(x)

loss_A = lossA(output, label)
loss_B = lossB(output, label)

Note that the optimization over loss_A only updates part A, and the optimization over loss_B only updates part B. I am wondering how to do it.

Here is my resolution, but I am not sure if it is correct.

# in function forward()
x = A(input)
output = B(x)
output_detach = B(x.detach)

# in function main()
optimizer_A = SGD([{'params': A.parameters()}], lr=lr, weight_decay=weight_decay)
optimizer_B = SGD([{'params': B.parameters()}], lr=lr, weight_decay=weight_decay)

loss_B = lossB(output_detach, label)
optimizer_B.zero_grad()
loss_B.backward(retain_graph=True)
optimizer_B.step()

loss_A = lossA(output, label)
optimizer_A.zero_grad()
loss_A.backward()
optimizer_A.step()

Could anyone solve this question? Thanks in advanced!

Xu_Yixuan · March 26, 2019, 3:23am

What you want to do is quite similar to GoogleLeNet, where additional loss will be directly back-propagated to intermediate layers. I think there is no need to call backward() for loss_A, loss_B separately. You can try this one:

optimizer_A.zero_grad()
optimizer_B.zero_grad()
loss_A = ...
loss_B = ...
loss = loss_A + loss_B
loss.backward()
optimizer_A.step()
optimizer_B.step()

Also, this problem may be helpful to you (https://stackoverflow.com/questions/53994625/how-can-i-process-multi-loss-in-pytorch)

mingzhang96 · March 26, 2019, 7:09am

Thanks for your response!

But with your approach, when you do

loss = loss_A + loss_B
loss.backward()

both part A and B will calculate gradient for loss_A and loss_B, and add them together. And when you do

optimizer_B.step()

optimizer_B will update part B with loss_A. This is what I don’t need.

Could you give me more advice please? Thanks!