Given a model (e.g. a cnn) with two losses, with the first loss (loss1) computed halfway through the cnn and the second loss (loss2) computed at the end of the cnn. Then, these two losses are summed and we do total_loss.backward() (total_loss = loss1 + loss2). Will the gradients of the second half of the cnn be impacted by loss1 (i.e. would the gradients of the second half of the cnn be same if we did instead loss2.backward() )?

Hi,

No it won’t because for a parameter w in the second half of your net, dloss1/dw = 0. So the contribution of these will just be 0 (ignored in practice).