Hi,
I use two networks in my architecture. The first (pretrained) network is estimating data for which I have ground-truth information available. The other network is using that output from the first network to do a different task which is complementary to solving the first one.
I.e. good estimates from the first network will improve solving the task of the second network.
Therefore, I thought that by optimizing the second network, I will implicitly also train the first network to give correct outputs.
I have two loss functions:
-
loss1
is training the first network by comparing its output to ground-truth data. -
loss2
is training the second network by using the output of the first network + data synthesis
Unfortunately, the error of the output w.r.t. ground-truth from my first network is getting worse over time.
I think that this comes from misleading gradients coming from loss2
that might initially benefit the optimization of that objective.
Is there any way to prioritize gradients coming from different loss functions?
Can I somehow say: “If gradients from loss1
and loss2
differ, always choose the loss1
-gradient”?
Thanks in advance!