I have a loss L = L_1 + L_2
depending on two terms L_1
and L_2
. For my particular problem, I find that L_1
is often small while L_2
is very large, so when trying to minimize L
, the algorithm tends to prioritize minimizing L_2
. Is there a way to make the gradient contribution of both terms equal, for instance by normalizing them?
I think one way is to measure the two loss’s magnitude before hand and then multiplying loss with large magnitude with a small coefficient to make the two loss have approximately same magnitude.