What is the right way to train a network for two tasks simultaneously?

Ozan · May 5, 2019, 4:51pm

Hi,

I am training a network for two separate tasks. My feature extractor F (e.g., resnet) features are fed two small networks with fully connected layers (and some nonlinearities), T1 and T2. So,

T1(F(Input1)) = Output 1
T2(F(Input2)) = Output 2

I’ve set up the model such that its input is input1 and input2:

pred1, pred2 = model(input1, input2)

loss = lossfn(pred1, label1) + lossfn(pred1, gt2)
optimizer.zero_grad()
loss.backward()
optimizer.step()

Here I have a problem. Since inputs 1 & 2 share the same feature extractor, I am afraid that the gradients are corrupted. In code, I essentially have a feature extractor + task1 network + task2 network. Although task1 only receives input from input1 (ie first term in model(., .). the backward path is shared with task 2.

Alternatively, I did this:

pred1, _ = model(input1, None)
optimizer.zero_grad()
loss = lossfn(pred1, label1)
loss.backward()
optimizer.step()

_, pred2 = model(input2, None)
optimizer.zero_grad()
loss = lossfn(pred2, label2)
loss.backward()
optimizer.step()

This way, I only use the backward path once per task.

Is this the right way to do it?
I essentially want to train for separate tasks simultaneously, not letting one task get better while losing accuracy on the other.

The inputs are not the same, but they are similar, so feature extractor actually can extract reasonable representations.

Thank you!

MariosOreo · May 6, 2019, 6:39am

Hello @Ozan,

These two approaches did the same thing and they will get the same results, the first way is neater. Because the backward perform accumulation of gradient w.r.t. loss of each task, you don’t need worry the gradients will be corrupted.

Thus, both way is correct in your use case.
(If I misunderstand it, correct me please.)