Question about the loss of two mutual learning networks

Hi,
I want to implement mutual learning: two Resnet study collaboratively. I am confused about how to Backprop the addition of two loss

Each net’s loss function has two parts, including supervised learning loss( cross entropy), and mimicry loss (KL Divergence between two outputs), and the total loss of a network is just the addition of those two parts.

The following is a sketch of the loss for one network:

out1 = resnet1(image)
out2 = resnet2(image)

mimicry_loss1= KLDivergence_loss(out1.detach(), out2.detach())
#I have to detach, otherwise I will get the error: RuntimeError: the derivative for ‘target’ is not implemented#

loss1 = crossentropy(out1, label) + mimicry_loss1
#Resnet1’s totoal loss#

loss1.backward()
optimizer1.step() # Each net has a optimizer

Would that Backprop the total loss for Resnet1? (I am not sure about the .detach())

Hi, I assume you need only detach for out2. Could you show me you KLDivergence_loss function. I am implement it in tensorflow, but currently pytorch still not work