I’m trying to understand how the output from multiple loss functions within a network is applied to the network. For example if I have a network with 3 stages where there is a loss calculated at the end of each stage (the same loss function) how is that then applied in
... model = MyModel() loss1, loss2, loss3 = model(inputs) total_loss = loss1 + loss2 + loss3 optimizer.zero_grad() total_loss.backward() ...
total_loss used to calculate the grads across the whole network (all 3 stages) or does
loss1 get used to calculate the grad for stage 1 and
loss2 used for stage 2 AND stage 1 and
loss3 used for stage 3, stage 2 AND stage 1? Or something else?
Thanks, I hope that makes sense. Some examples of this staged approach in the wild are the hourglass human pose paper and the convolutional pose machines paper.