Multiple losses from two GRU models

iWishii · March 3, 2022, 1:48pm

I am training a model with different outputs in PyTorch, and I have four different losses for positions (in meter), rotations (in degree), and velocity, and a boolean value of 0 or 1 that the model has to predict.
AFAIK, there are two ways to define a final loss function here:

one - the naive weighted sum of the losses

two - the defining coefficient for each loss to optimize the final loss.

More details about the model: (I have a GRU model (lets call GRU1) with two linear output layers. (both have ReLu as the activation function):

First layer is a sequence of positions(in meter) and rotations(in degree),
Second layer gives me a boolean value of 0 or 1.

Also I have another metrics called velocity that is the output from another GRU model (lets call GRU2) and the input for the GRU1.)

So, My question is “how to weigh these losses to obtain the final loss, correctly”?

knoriy · March 3, 2022, 2:30pm

What is your application?

Think of the two methods you proposed as one, The naive approache can be seen as the sum of all losses with a coefficient of 1.

total_loss = (p_bais * p_loss) + (r_bais * r_loss) + (v_bais * v_loss) + (b_bais * b_loss)

The biases above are simply put hyperparameter to be tune at training time, there is no right or wrong answer to what these could be, and would simply be guided by your application (use case).

iWishii · March 3, 2022, 3:10pm

Thank you for your comment. I`m gonna use this model later in unity engine.
Since they are different values, I supposed there should be some hyper parameters to indicate their importance. I think the coefficient for both positions and rotations will be 1, since they are both play the main role in my case. On the other hand they are not relevant in the sense that one is presented in meters and the other one in degree, Then I am confused if iv to give them coefficients or not.

knoriy · March 3, 2022, 3:39pm

I don’t fully understand. Your loss function should compute your loss between your prediction and that target. It shouldn’t matter what data your model is outputting.

for example:

loss_fn = nn.MSELoss()

model_out = torch.tensor([[50.0]]) # degrees
target = torch.tensor([[45.0]] ) # degrees

print(loss_fn(model_out,target))
>>> tensor(25.)

if you don’t give a coefficient, it will be considered as if the coefficient is 1…

iWishii · March 3, 2022, 4:09pm

I understand what you mean for my wrong assumption. But honestly I didn`t get my answer yet.

I was reading some papers similar to my case, and I see that they define the hyper-parameters for the various losses they have based on binary cross-entropy (for the boolean one), and some other coefficients from a random weighting function.
I would be glad If you or anyone else have any comments about this.