I’m trying to understand how the output from multiple loss functions within a network is applied to the network. For example if I have a network with 3 stages where there is a loss calculated at the end of each stage (the same loss function) how is that then applied in
model = MyModel()
loss1, loss2, loss3 = model(inputs)
total_loss = loss1 + loss2 + loss3
total_loss used to calculate the grads across the whole network (all 3 stages) or does
loss1 get used to calculate the grad for stage 1 and
loss2 used for stage 2 AND stage 1 and
loss3 used for stage 3, stage 2 AND stage 1? Or something else?
Thanks, I hope that makes sense. Some examples of this staged approach in the wild are the hourglass human pose paper and the convolutional pose machines paper.