I think the weighting can sum to any value and would just scale the loss and thus the gradients.
If you’ve adapted the learning rate already to the loss scale you might want to keep the sum of the weights as 1 to avoid scaling it further.
Great, thanks for the reply. I was thinking of doing Bayesian optimization for finding the optimal weights of combined loss functions. I was worried about the domains, some researchers used 10^5 as constraints somewhere, and I was thinking if there is some standard constraints. So, increasing the constants will just increase the gradients. Hence, it is better to keep the constraints convex.
Also, in case of very deep architectures, will it be convenient to use number greater than 1 to minimize the vanishing gradient problem? Or it will be okay to keep the sum of the constrains = 1?