What should be the ideal constraint for calculating weighted loss?

We can calculate weighted loss by defining a custom loss like this:

custom_loss = l_1 * loss_1 + l_2 * loss_2 + …. + l_N * loss_N

Say we are using loss_1 as MAE and loss_2 as MSE, etc.

I wanted to know that, should the constraint for l_i’s always needs to be 1 for weighted average?

l_1 + l_2 + … + l_N = 1

Or it could be greater than 1?

What are the pros and cons for choosing a value larger than 1?

Also note that I will be using Adam optimizer for this purpose.

I think the weighting can sum to any value and would just scale the loss and thus the gradients.
If you’ve adapted the learning rate already to the loss scale you might want to keep the sum of the weights as 1 to avoid scaling it further.

1 Like

Great, thanks for the reply. I was thinking of doing Bayesian optimization for finding the optimal weights of combined loss functions. I was worried about the domains, some researchers used 10^5 as constraints somewhere, and I was thinking if there is some standard constraints. So, increasing the constants will just increase the gradients. Hence, it is better to keep the constraints convex.

Also, in case of very deep architectures, will it be convenient to use number greater than 1 to minimize the vanishing gradient problem? Or it will be okay to keep the sum of the constrains = 1?