Multi-task learning: backward pass on intermediate loss?

AIJoris · March 27, 2019, 9:07am

Hi,

I’m using a seq2seq transformer network in a multi-task learning setting. I have a main text generation task and an auxiliary classification task that uses intermediate output as prediction. For both tasks, I do a full forward pass, but for the auxiliary task I only use the output of an intermediate layer to compute the loss.

My question is what happens when I do a backward pass on this intermediate loss. I would think the forward pass builds a computational graph for the entire network, but since the loss only uses part of this graph I assume the parameters that come after this intermediate layer will not be affected. Is this reasoning correct?

MariosOreo · March 28, 2019, 7:57am

Hello,

AFAIK, performing backward pass on the full loss is equal to backward separately.
The following two case are the same.

loss = loss1 + loss2
loss.backward()

loss1.backward(retain_graph=True)
loss2.backward()

If I am wrong, please correct me.

AIJoris · March 29, 2019, 9:24am

Thanks for you answer. However, the crux of my question is whether using intermediate network outputs to compute a loss and perform backprop, while doing a full forward pass, will or will not update parameters that influence outputs after my intermediate loss.

Ali_Mirzaeyan · May 25, 2019, 2:50pm

Hi,
I don’t think these two expressions are the same. because the loss is just a number, so when you backpropagate the addition of these two losses, the auxiliary loss also will propagate into the entire network which is not what Joris intended to do. So I think the second one make more sense for the discussed problem.