I use two networks in my architecture. The first (pretrained) network is estimating data for which I have ground-truth information available. The other network is using that output from the first network to do a different task which is complementary to solving the first one.
I.e. good estimates from the first network will improve solving the task of the second network.
Therefore, I thought that by optimizing the second network, I will implicitly also train the first network to give correct outputs.
I have two loss functions:
loss1is training the first network by comparing its output to ground-truth data.
loss2is training the second network by using the output of the first network + data synthesis
Unfortunately, the error of the output w.r.t. ground-truth from my first network is getting worse over time.
I think that this comes from misleading gradients coming from
loss2 that might initially benefit the optimization of that objective.
Is there any way to prioritize gradients coming from different loss functions?
Can I somehow say: “If gradients from
loss2 differ, always choose the
Thanks in advance!