What should be the ideal constraint for calculating weighted loss?

Jimut123 · June 24, 2022, 11:33am

We can calculate weighted loss by defining a custom loss like this:

custom_loss = l_1 * loss_1 + l_2 * loss_2 + …. + l_N * loss_N

Say we are using loss_1 as MAE and loss_2 as MSE, etc.

I wanted to know that, should the constraint for l_i’s always needs to be 1 for weighted average?

l_1 + l_2 + … + l_N = 1

Or it could be greater than 1?

What are the pros and cons for choosing a value larger than 1?

Also note that I will be using Adam optimizer for this purpose.

ptrblck · June 25, 2022, 6:46am

I think the weighting can sum to any value and would just scale the loss and thus the gradients.
If you’ve adapted the learning rate already to the loss scale you might want to keep the sum of the weights as 1 to avoid scaling it further.

Jimut123 · June 25, 2022, 7:08am

Great, thanks for the reply. I was thinking of doing Bayesian optimization for finding the optimal weights of combined loss functions. I was worried about the domains, some researchers used 10^5 as constraints somewhere, and I was thinking if there is some standard constraints. So, increasing the constants will just increase the gradients. Hence, it is better to keep the constraints convex.

Also, in case of very deep architectures, will it be convenient to use number greater than 1 to minimize the vanishing gradient problem? Or it will be okay to keep the sum of the constrains = 1?