Hi,

I want to implement mutual learning: two Resnet study collaboratively. I am confused about how to Backprop the addition of two loss

Each net’s loss function has two parts, including supervised learning loss( cross entropy), and mimicry loss (KL Divergence between two outputs), and the total loss of a network is just the addition of those two parts.

The following is a sketch of the loss for one network:

out1 = resnet1(image)

out2 = resnet2(image)

mimicry_loss1= KLDivergence_loss(out1.detach(), out2.detach())

#I have to detach, otherwise I will get the error: RuntimeError: the derivative for ‘target’ is not implemented#

loss1 = crossentropy(out1, label) + mimicry_loss1

#Resnet1’s totoal loss#

loss1.backward()

optimizer1.step() # Each net has a optimizer

Would that Backprop the total loss for Resnet1? (I am not sure about the .detach())