How to do multi-task training?

I am quite confused about how to do multi-task training.

For example, I got a picture with an animal, I want to get four kinds of output:

  • Length of Nose / invisible, long, middle, short
  • Length of Tail / invisible, long, middle, short, no tail
  • Length of Hand / invisible, long, middle, short
  • Length of Leg / invisible, long, middle, short

What should I do, if I use ResNet pre-trained model, and do the 4-tasks classify in one network. The Struct like this:

I search in google, but I can only get links as follow, but it does not work.


If you put your dropout_1 output into the 4 Linear layers (probably with 4 outputs, not 36) you can then calculate the losses individually and add them up to form a total loss.

Using that for gradient descent will do the right thing.

Best regards


I figured out how to construct my model, but the problem is how to calculate loss. My case is kind of special, each pic only has one kind of label.

To be more specific, there is a picture, and the pic only have Length of Hand ground truth label, there will be no other three labels.

Pic -> [no, no, ‘long’, no] , no means we don’t have the label, and I use 0 represent no which is same as Invisible.

For now, I add the losses up as my loss, but I think is not accurate, do you have any suggestion?


Just add the losses of labels that you have.

I use nn.CrossEntropyLoss() as my criterion function, and the label as follow:


And code about loss is as follow:

# forward
outputs = model(inputs)
loss = 0

for lx in range(len(outputs)):
    tmp_loss = criterion(outputs[lx], labels[:, lx])
    loss += tmp_loss

# backward + optimize only if in training phase
if phase == 'train':

The main problem here is how to add the losses of labels I have, because 0 stands for No Label and Invisible, it is not easy to differ.

Should I use other criterion function?

I tried this nn.MultiLabelSoftMarginLoss() criterion:

But the accuracy is not that satisfied compared to CrossEntropyLoss, I don’t know what to do now…

So you don’t have the distinction or you loose it in data preparation?

Ahh, I can use 0 stands for no label, and index starts from 1, but there will be 5 outputs per task, which there should be 4 output.

Although I use 0 for no label, but the loss is for a batch, which is a variable(number), I am stuck on here, how to add loss with label.

I would recommend using a mask (1 label, 0 no label) and labels 0…3 (doesn’t matter what if the mask is 0). Then you can take the loss with reduce=False and multiply with the mask, sum over classes and take the mean over the batch.

Best regards


1 Like

Yeah, I am trying this, do I need to change criterion function?

In terms of your code above (CrossEntropyLoss seems good to me unless you have a reason not to use it):

criterion = CrossEntropyLoss(reduce=False)
loss = 0
for lx in range(len(outputs)):
    loss = loss + (mask[:,lx]*criterion(outputs[lx], labels[:, lx])).mean()

(So the loss with reduce=False reduces over the classes, giving you a float vector of batch_size, mask should be a float of shape batch_size x tasks).

Looking at the CrossEntropyLoss documentation, I also found an alternative to using the mask: If you put -100 in your label where you don’t have one, you can use the ignore_index feature and it’ll do the right thing for you without needing changes to your code above.

Best regards



Wow, that will be extremely helpful! I will try it now, and feedback later!


1 Like

From some points, it works but the loss is high compared to Keras with same structure, what can it be? The output block problem?

  (conv2d_7b): BasicConv2d(
    (conv): Conv2d(2080, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True)
    (relu): ReLU()
  (avgpool_1a): AvgPool2d(kernel_size=8, stride=8, padding=0, ceil_mode=False, count_include_pad=False)
  (last_linear): RetBlock(
    (liner_nose): Linear(in_features=1536, out_features=4, bias=True)
    (liner_tail): Linear(in_features=1536, out_features=5, bias=True)
    (liner_hand): Linear(in_features=1536, out_features=4, bias=True)
    (liner_leg): Linear(in_features=1536, out_features=4 bias=True)

My code for output block:

class RetBlock(nn.Module):

    def __init__(self):
        super(RetBlock, self).__init__()
        in_features = 1536
        self.liner_nose = nn.Linear(in_features, 4)
        self.liner_tail = nn.Linear(in_features, 5)
        self.liner_hand = nn.Linear(in_features, 4)
        self.liner_leg = nn.Linear(in_features, 4)

    def forward(self, x):
        x_nose = self.liner_nose(x)
        x_tail = self.liner_tail(x)
        x_hand = self.liner_hand(x)
        x_leg = self.liner_leg(x)
        return x_nose, x_tail, x_hand, x_hand, x_leg

model_ft.last_linear = RetBlock()

Any wrong 0.0


1 Like

hi Dave,Do you have solved this problem?Could you put your code in github?I want to learn from it.
Best wish!

Hi, Dave, would you please share your codes? thx

you can use label -1 index no label,
set criterion = nn.CrossEntropyLoss(ignore_index=-1, reduction=“mean”)
I want to know, for multi-task, the label is a label list, so how to create your Dataset, Could you please show me your defined Dataset? Thank you very much~

Shameless plug: I wrote a little helper library that makes it a little easier to do multi-task learning: torchMTL. It should work for your example and makes it easy to combine the losses while keeping control over the training loop. I thought it might be of interest for people who are running into similar issues.

1 Like