Optimization Problems about Two-heads CNN

Hi guys, currently I’m trying to implement a CNN which has two outputs(like the figure below).

What I want is that Label1 will optimize CNN1 and Label2 will optimize CNN2. As for Shared CNN, it will be optimized by both Label1 and Label2. I’m wondering if it will work if the code is as follows:

output1, output2 = CNN(img)

loss1 = F.kl_div(output1, label1)
loss2 = F.kl_dic(output2, label2)

total_loss = loss1+loss2

optimizer.zero_grad()    # optimizer has declared before
total_loss.backward(retain_graph=True)
optimizer.step()

If it will not behave what I desire, can someone teach me or give me some hint of how to achieve the function? Thanks very much!

IT should work as you’ve explained it.
Alternatively you could call backward on each of the losses separately and check the gradients of the shared CNN and both branched CNNs.
You’ll see that the gradients are calculated for the corresponding branch and accumulated for the shared CNN.

1 Like

Thanks for your reply.
It works the expected way.