Prioritize Gradients from Different Loss Functions



I use two networks in my architecture. The first (pretrained) network is estimating data for which I have ground-truth information available. The other network is using that output from the first network to do a different task which is complementary to solving the first one.

I.e. good estimates from the first network will improve solving the task of the second network.

Therefore, I thought that by optimizing the second network, I will implicitly also train the first network to give correct outputs.

I have two loss functions:

  • loss1 is training the first network by comparing its output to ground-truth data.
  • loss2 is training the second network by using the output of the first network + data synthesis

Unfortunately, the error of the output w.r.t. ground-truth from my first network is getting worse over time.

I think that this comes from misleading gradients coming from loss2 that might initially benefit the optimization of that objective.

Is there any way to prioritize gradients coming from different loss functions?

Can I somehow say: “If gradients from loss1 and loss2 differ, always choose the loss1-gradient”?

Thanks in advance!

(Juan F Montesinos) #2

Well, you can do that if you can code it.
For example, gradient clipping is nothing but a normalization on the gradients. You can pick gradients after calling backward()

If you are capable to find a good metric to compare them you can propagate only gradients for one of the signals.
However, I don’t believe gradients are gonna be the same in most of cases. Maybe it’s easier for you just to detach the backprop from the second network such that only signal 1 flows.