I am training a network for two separate tasks. My feature extractor F (e.g., resnet) features are fed two small networks with fully connected layers (and some nonlinearities), T1 and T2. So,
T1(F(Input1)) = Output 1
T2(F(Input2)) = Output 2
I’ve set up the model such that its input is input1 and input2:
pred1, pred2 = model(input1, input2)
loss = lossfn(pred1, label1) + lossfn(pred1, gt2)
Here I have a problem. Since inputs 1 & 2 share the same feature extractor, I am afraid that the gradients are corrupted. In code, I essentially have a feature extractor + task1 network + task2 network. Although task1 only receives input from input1 (ie first term in model(., .). the backward path is shared with task 2.
Alternatively, I did this:
pred1, _ = model(input1, None)
loss = lossfn(pred1, label1)
_, pred2 = model(input2, None)
loss = lossfn(pred2, label2)
This way, I only use the backward path once per task.
Is this the right way to do it?
I essentially want to train for separate tasks simultaneously, not letting one task get better while losing accuracy on the other.
The inputs are not the same, but they are similar, so feature extractor actually can extract reasonable representations.