How to combine multiple criterions to a loss function?

Gabrer · October 25, 2018, 6:57pm

mse_loss = nn.MSELoss(size_average=True)
a = weight1 * mse_loss(inp, target1)
b = weight2 * mse_loss(inp, target2)
loss = a + b
loss.backward()

What if I want to learn the weight1 and weight2 during the training process?
Should they be declared parameters of the two models? Or of a third one?

justusschock · October 25, 2018, 7:12pm

To learn them would not make sense for me since your network will only learn to minimize the weights which will automatically minimize the partial losses and thus the total loss.

phaypyt · April 12, 2019, 8:18am

The weight should be int or tensor? Or both works?

achaiah · April 12, 2019, 6:01pm

Int, since you’re weighing each type of loss instead of each value individually.

PySimon · April 17, 2019, 8:16am

You could add another loss that penalizes the model for learning weight1 = weight2 = 0

Rojin · June 19, 2019, 10:20pm

Did you learn how to do this?

marchinidavide · August 5, 2019, 10:26am

What is the preferred way to combine loss functions between:

output = net(input)
loss1 = w1 * crit1(output, target1)
loss2 = w2 * crit2(output, target2)
loss = loss1 + loss2
loss.backward()
print(loss.item())

and

output = net(input)
loss1 = w1 * crit1(output, target1)
loss1.backward()
loss2 = w2 * crit2(output, target2)
loss2.backward()
print(loss1.item() + loss2.item())

Is there a conceptual difference and/or computational advantage in terms of speed/memory in any of the two?

Thanks in advance

marchinidavide · August 5, 2019, 10:38am

In the DCGAN Tutorial the preferred methodology is the second when making the two forward passes with real and fake data in the discriminator … any friendly soul willing to comment and explain why this is the best method?

gabiBettgenhaeser · February 20, 2020, 11:47am

Did you find it out?

aaks · April 1, 2020, 9:04pm

Hi, I was working with multi loss problem. I faced the following issue:

loss1 = criterion1() #something
loss2 = criterion2() #something
NetLoss = loss1 + loss2
running_loss1 = loss1.item()
running_loss2 = loss2.item()
net_loss += NetLoss.item()

However, for me,
net_loss > running_loss1+running_loss2

Can this be possible?

ptrblck · April 2, 2020, 12:48am

Since you are accumulating the losses in net_loss via +=, your comparison might be true.

aaks · April 2, 2020, 7:01am

Oh, my bad.

Thanks

Mithun_Paul · May 8, 2020, 12:30am

How do we then make the weights learnable? I was looking at this discussion. Won’t those weights also be minimized? or is it because they are using nn.parameter(), it will be ok? @ptrblck

Basically: I understand that adding weights as learnable is a bad idea, because the model is going to find that making the weights zero is the best optimization. However, what i dont understand is what they are doing in that discussion pasted above.

ptrblck · May 8, 2020, 3:26am

I went through the discussion again and I don’t think the weights are learnable, but are instead weighting the individual losses and thus scaling the gradients.

I might be wrong, so feel free to point me to the right code.

I would also assume the same.

Udith_Haputhanthri · May 29, 2020, 12:11pm

Hi, I am trying to replicate the results of the pix2pix conditional GAN. In the paper, it says that the loss contains the L1 loss. Here is the objective function of the loss mentioned in the paper. (link to paper: https://arxiv.org/pdf/1611.07004.pdf)

When observing the objective function, the L1 loss seems to affect only the generator. Therefore I used the loss function as below.

bce=nn.BCELoss().to(device)
L1=nn.L1Loss().to(device)
k=50
def criterion_G(y_hat,y):
  loss=bce(y_hat,y)+k*L1(y_hat,y)
  return loss
def criterion_D(y_hat,y):
  loss=bce(y_hat,y)
  return loss

But the model does not seem like properly training (may be it is normal). Can someone help me with this?
How can I implement the above mentioned loss ?

Hemantr05 · July 7, 2020, 11:54am

@ptrblck wouldn’t averaging the combined criterions and multiplying a normalizing value, be a better option?

ptrblck · July 8, 2020, 3:05am

I don’t know which approach you are referring to, as the discussion seems to mention different use cases.
Could you give an example of your use case and compare it to the “worse” approach?

Eduard_Kieser · July 22, 2020, 1:37pm

Would learning the weights make sense if we formulate the loss as:

loss_a = loss_a * learned_weight
loss_b = loss_b / learned_weight
loss = loss_a + loss_b

ptrblck · July 23, 2020, 3:57am

This is an interesting idea and I would recommend to go for it.
Feel free to post an update, if you have some insights into this kind of penalty weighting.

di_19 · January 6, 2021, 11:44am

did that work? @Eduard_Kieser