How do multiple loss function apply during backwards

luke · August 31, 2017, 5:32am

I’m trying to understand how the output from multiple loss functions within a network is applied to the network. For example if I have a network with 3 stages where there is a loss calculated at the end of each stage (the same loss function) how is that then applied in .backwards()?

...
model = MyModel()
loss1, loss2, loss3 = model(inputs)

total_loss = loss1 + loss2 + loss3

optimizer.zero_grad()
total_loss.backward()
...

Is total_loss used to calculate the grads across the whole network (all 3 stages) or does loss1 get used to calculate the grad for stage 1 and loss2 used for stage 2 AND stage 1 and loss3 used for stage 3, stage 2 AND stage 1? Or something else?

Thanks, I hope that makes sense. Some examples of this staged approach in the wild are the hourglass human pose paper and the convolutional pose machines paper.

Thanks,
Luke