For my model, the total loss function is of the form
L = c1*L1 + c2*L2 + c3*L3 + .. where c1, c2, c3, … are to be trainable coefficients of a loss function and L1, L2, L3,… are individual loss function terms.
Since these coefficient parameters are outside my model trainable weights, how can I train these coefficients so as to minimize the loss function L?
Thank you in advance.
As you observe the coefficients aren’t part of the training, and for good reason: if your losses are all positive, 0 coefficients would give you zero L, and you could even move them to negative and then you would improve L if the L1 … increase.
This is why, indeed, the coefficients are not and cannot be part of the optimization of L itself and so are what is termed a hyper-parameter (as opposed to normal, trainable parameters), similar to learning rate etc.
Now what is commonly done is to take some validation metric (e.g. accuracy for classification) and then try to find c1, … that lead to good results in that metric. Various approaches exist from manual search to randomly varying them to “learning to learn” approaches to Bayesian optimization schemes.
Randomly varying them becomes an extremely difficult task when the coefficients are in the decimal numbers. Isn’t there a way to derive the coefficients so that there is at least one non-zero coefficient for min loss function?
I don’t think there is a much better approach than treating them as hyper-parameters, at least not in this generality. In particular you won’t get around having some other metric of “was this a good pick”. You’re not required to vary them randomly, using intuition also works many times.
When you get to specific losses that have specific interpretations, things may improve (e.g. there might be probabilistic interpretatons or you in very specific cases it may be useful to try to achieve similar orders of magnitude in the loss contributions).
Is there any way to solve this problem?
I need to define a loss
L=c1*L1 + c2*L2 where c1 and c2 are trainable coefficients. How can I define them in a parameter mode?
As mentioned in the question " Since these coefficient parameters are outside my model trainable weights, how can I train these coefficients so as to minimize the loss function L ?", is there any way?
Thank you in advance.