Normalize gradient of each contributing term of a total loss

mikolchon · November 22, 2017, 9:11pm

I have a loss L = L_1 + L_2 depending on two terms L_1 and L_2. For my particular problem, I find that L_1 is often small while L_2 is very large, so when trying to minimize L, the algorithm tends to prioritize minimizing L_2. Is there a way to make the gradient contribution of both terms equal, for instance by normalizing them?

jdhao · November 27, 2017, 3:24pm

I think one way is to measure the two loss’s magnitude before hand and then multiplying loss with large magnitude with a small coefficient to make the two loss have approximately same magnitude.