How to combine multiple criterions to a loss function?

I;m building recursive neural nets, so I’ll have a variable list of criterions. I want to sum them up and backpropagate error.
All the errors are single float values and of same scale. So the total loss is the sum of individual losses

To understand this, I crystallized this problem. It’s for another classification project.
I wrote this code and it works

def loss_calc(data,targets):
  data = Variable(torch.FloatTensor(data)).cuda()
  targets = Variable(torch.LongTensor(targets)).cuda()
  output= model(data)
  final = output[-1,:,:]
  loss = criterion(final,targets)
  return loss

Now I want to know how I can make a list of criterions and add them for backpropagation.
I tried these, but they don;t work

def loss_calc(data,targets):
  data = Variable(torch.FloatTensor(data)).cuda()
  targets = Variable(torch.LongTensor(targets)).cuda()
  output= model(data)
  final = output[-1,:,:]
  loss = 0.0
  for b in range(batch_size):
    loss += criterion(final[b], targets[b])
  return loss

I doesn;t return errors, but the network simply won;t train.

This alternative results an error

def loss_calc(data,targets):
  data = Variable(torch.FloatTensor(data)).cuda()
  targets = Variable(torch.LongTensor(targets)).cuda()   
  output= model(data)
  final = output[-1,:,:]
  loss = []
  for b in range(batch_size):
    loss.append(criterion(final[b], targets[b]))
  loss = torch.sum(loss)
  return loss

Note, this is a dummy example. If I understand how to fix this, I can apply that to the recursive neural nets.

For the final project, the sequence of losses have arbitrary length. For example, sometimes it’s adding 4 losses, other times 6. So it’s not an option to pre-allocate a Tensor and save the losses to the Tensor

11 Likes

Yeah, you can optimize a variable number of losses without a problem. I’ve had an application like that, and I just used total_loss = sum(losses) where losses is a list of all costs. Calling .backward() on that should do it. Note that you can’t expect torch.sum to work with lists - it’s a method for Tensors. As I pointed out above you can use sum Python builtin (it will just call the + operator on all the elements, effectively adding up all the losses into a single one).

Iterating over a batch and summing up the losses should work too, but is unnecessary, since the criterions already support batched inputs. It probably doesn’t work for you, because criterions average the loss over the batch, instead of summing them (this can be changed using the size_average constructor argument). So if you have very large batches, your gradients will get multiplied by the batch size, and will likely blow up your network. Just divide the loss by the batch size and it should be fine.

33 Likes

Thank you. That helped a lot.

I know I can batch this criterion. But iI’m doing it to understand how PyTorch works.
And this explained a lot to me. So thank you very much

@apaszke

Is it possible to add losses of multiple types provided by torch.nn?

For eg. I have a network with two outputs and corresponding losses - crossentropy2d and crossentropy. I tried the following but it throws an error about incompatible types. Is there a way around?

b = cross_entropy2d(output_x, x_labels)
a = nn.CrossEntropyLoss(output_y, y_labels)
loss = a + b

Additional question - Is it possible to weigh the losses ie. total_loss = a + alpha * b ?

6 Likes

You should be using F.cross_entropy_loss, not the nn.Module form (or you can write your second line as a = nn.CrossEntropyLoss()(output_y, y_labels)). Right now you’re just creating a module, not actually computing the loss.

7 Likes

@jekbradbury

Thanks a lot.

still the question remains, can one do what @meetshah1995 asked with 2 different loss criteria?

b = nn.MSELoss(output_x, x_labels)
a = nn.CrossEntropyLoss(output_y, y_labels)
loss = a + b

Doing that is fine, it would be:

b = nn.MSELoss()(output_x, x_labels)
a = nn.CrossEntropyLoss()(output_y, y_labels)
loss = a + b

loss.backward()

Note the additional parentheses, as James mentioned above.

This is equivalent to:

b = nn.MSELoss()
a = nn.CrossEntropyLoss()

loss_a = a(output_x, x_labels)
loss_b = b(output_y, y_labels)

loss = loss_a + loss_b

loss.backward()
19 Likes

Hi, I have another question.
I want to sum multiple losses, which are all the same loss functions.


1:
criterion = nn.MSELoss().cuda()
loss = criterion(a, label) + criterion(c, label)
2:
criterion1, criterion2 = nn.MSELoss().cuda(), nn.MSELoss().cuda()
loss = criterion1(a, label) + criterion2(c, label)

which way should I take? Thanks.

1 Like

both give you same result. I’d say (1) is simpler.

12 Likes

I did this but got the following error :

add received an invalid combination of arguments - got (torch.FloatTensor), but expected one of:

  • (float value)
    didn’t match because some of the arguments have invalid types: (torch.FloatTensor)
  • (torch.cuda.FloatTensor other)
    didn’t match because some of the arguments have invalid types: (torch.FloatTensor)
  • (torch.cuda.sparse.FloatTensor other)
    didn’t match because some of the arguments have invalid types: (torch.FloatTensor)
  • (float value, torch.cuda.FloatTensor other)
  • (float value, torch.cuda.sparse.FloatTensor other)

Both my loss are of the same type - nn.CrossEntropyLoss() <class ‘torch.autograd.variable.Variable’>

Any suggestion on how to resolve this please?

If you stare down the error message, you see that you have one cuda and one not.

Best regards

Thomas

6 Likes

Yes, thank you @tom. I am very novice here.

Is it possible to weight the losses when multiple losses are used and how? Is it correct by using the following code?

    mse_loss = nn.MSELoss(size_average=True)
    a = weight1 * mse_loss(inp, target1)
    b = weight2 * mse_loss(inp, target2)
    loss = a + b

    loss.backward()
5 Likes

Yup, you can certainly weigh the losses however you see fit. What you’re doing in your example should be fine.

3 Likes

what if my second loss function requires some computed value from first loss (or even the grad of first loss?) in that case I can’t add two loss together; they must be gradients respectively; and retain_graph=True gives wrong results as well as the intermediate grads not correct

see this one

1 Like

thank you very much Jordan! your answer help me to combine my custom loss function with nn.Loss.

Hi BikashgG, I think you could use .type() to change tensor’s type from torch.FloatTensor to torch.LongTensor.

I face a similar problem when I use this CrossEntropyLoss().

Check the official document on this loss function, there is a requirement on the type of feed-in tensor.

Hope it helps,

Peter

What if the losses are computed over different parts of the network, say loss1 is for first 3 layers and loss2 is for first 7 layers (incl. the first 3)? Wouldn’t the sum of losses method also backprop loss1 through layers 4-7? Would calling loss1.backward() and loss2.backward() separately be recommended in that case?

2 Likes

Hope it helps!