I have a loss `L = L_1 + L_2`

depending on two terms `L_1`

and `L_2`

. For my particular problem, I find that `L_1`

is often small while `L_2`

is very large, so when trying to minimize `L`

, the algorithm tends to prioritize minimizing `L_2`

. Is there a way to make the gradient contribution of both terms equal, for instance by normalizing them?

I think one way is to measure the two lossâ€™s magnitude before hand and then multiplying loss with large magnitude with a small coefficient to make the two loss have approximately same magnitude.