How to combine multiple criterions to a loss function?

Yup, you can certainly weigh the losses however you see fit. What you’re doing in your example should be fine.

3 Likes

what if my second loss function requires some computed value from first loss (or even the grad of first loss?) in that case I can’t add two loss together; they must be gradients respectively; and retain_graph=True gives wrong results as well as the intermediate grads not correct

see this one

1 Like

thank you very much Jordan! your answer help me to combine my custom loss function with nn.Loss.

Hi BikashgG, I think you could use .type() to change tensor’s type from torch.FloatTensor to torch.LongTensor.

I face a similar problem when I use this CrossEntropyLoss().

Check the official document on this loss function, there is a requirement on the type of feed-in tensor.

Hope it helps,

Peter

What if the losses are computed over different parts of the network, say loss1 is for first 3 layers and loss2 is for first 7 layers (incl. the first 3)? Wouldn’t the sum of losses method also backprop loss1 through layers 4-7? Would calling loss1.backward() and loss2.backward() separately be recommended in that case?

2 Likes

Hope it helps!

mse_loss = nn.MSELoss(size_average=True)
a = weight1 * mse_loss(inp, target1)
b = weight2 * mse_loss(inp, target2)
loss = a + b
loss.backward()

What if I want to learn the weight1 and weight2 during the training process?
Should they be declared parameters of the two models? Or of a third one?

1 Like

To learn them would not make sense for me since your network will only learn to minimize the weights which will automatically minimize the partial losses and thus the total loss.

1 Like

The weight should be int or tensor? Or both works?

Int, since you’re weighing each type of loss instead of each value individually.

You could add another loss that penalizes the model for learning weight1 = weight2 = 0

Did you learn how to do this?

What is the preferred way to combine loss functions between:

output = net(input)
loss1 = w1 * crit1(output, target1)
loss2 = w2 * crit2(output, target2)
loss = loss1 + loss2
loss.backward()
print(loss.item())

and

output = net(input)
loss1 = w1 * crit1(output, target1)
loss1.backward()
loss2 = w2 * crit2(output, target2)
loss2.backward()
print(loss1.item() + loss2.item())

Is there a conceptual difference and/or computational advantage in terms of speed/memory in any of the two?

Thanks in advance :slight_smile:

2 Likes

In the DCGAN Tutorial the preferred methodology is the second when making the two forward passes with real and fake data in the discriminator … any friendly soul willing to comment and explain why this is the best method?

3 Likes

Did you find it out?

Hi, I was working with multi loss problem. I faced the following issue:

loss1 = criterion1() #something
loss2 = criterion2() #something
NetLoss = loss1 + loss2
running_loss1 = loss1.item()
running_loss2 = loss2.item()
net_loss += NetLoss.item()

However, for me,
net_loss > running_loss1+running_loss2

Can this be possible?

Since you are accumulating the losses in net_loss via +=, your comparison might be true.

Oh, my bad.

Thanks :slight_smile:

How do we then make the weights learnable? I was looking at this discussion. Won’t those weights also be minimized? or is it because they are using nn.parameter(), it will be ok? @ptrblck

Basically: I understand that adding weights as learnable is a bad idea, because the model is going to find that making the weights zero is the best optimization. However, what i dont understand is what they are doing in that discussion pasted above.

I went through the discussion again and I don’t think the weights are learnable, but are instead weighting the individual losses and thus scaling the gradients.

I might be wrong, so feel free to point me to the right code.

I would also assume the same.