How to do multi-task training?

DavexPro · March 14, 2018, 4:01am

I am quite confused about how to do multi-task training.

For example, I got a picture with an animal, I want to get four kinds of output:

Length of Nose / invisible, long, middle, short
Length of Tail / invisible, long, middle, short, no tail
Length of Hand / invisible, long, middle, short
Length of Leg / invisible, long, middle, short

What should I do, if I use ResNet pre-trained model, and do the 4-tasks classify in one network. The Struct like this:

I search in google, but I can only get links as follow, but it does not work.

tom · March 14, 2018, 5:24pm

If you put your dropout_1 output into the 4 Linear layers (probably with 4 outputs, not 36) you can then calculate the losses individually and add them up to form a total loss.

Using that for gradient descent will do the right thing.

Best regards

Thomas

DavexPro · March 15, 2018, 5:49am

I figured out how to construct my model, but the problem is how to calculate loss. My case is kind of special, each pic only has one kind of label.

To be more specific, there is a picture, and the pic only have Length of Hand ground truth label, there will be no other three labels.

Pic -> [no, no, ‘long’, no] , no means we don’t have the label, and I use 0 represent no which is same as Invisible.

For now, I add the losses up as my loss, but I think is not accurate, do you have any suggestion?

Thanks!
Dave

tom · March 15, 2018, 5:53am

Just add the losses of labels that you have.

DavexPro · March 15, 2018, 6:04am

I use nn.CrossEntropyLoss() as my criterion function, and the label as follow:

And code about loss is as follow:

# forward
outputs = model(inputs)
loss = 0

for lx in range(len(outputs)):
    tmp_loss = criterion(outputs[lx], labels[:, lx])
    loss += tmp_loss

# backward + optimize only if in training phase
if phase == 'train':
    loss.backward()
    optimizer.step()

The main problem here is how to add the losses of labels I have, because 0 stands for No Label and Invisible, it is not easy to differ.

Should I use other criterion function?

DavexPro · March 15, 2018, 6:09am

I tried this nn.MultiLabelSoftMarginLoss() criterion: multilabel_example.py

But the accuracy is not that satisfied compared to CrossEntropyLoss, I don’t know what to do now…

tom · March 15, 2018, 6:49am

So you don’t have the distinction or you loose it in data preparation?

DavexPro · March 15, 2018, 7:03am

Ahh, I can use 0 stands for no label, and index starts from 1, but there will be 5 outputs per task, which there should be 4 output.

Although I use 0 for no label, but the loss is for a batch, which is a variable(number), I am stuck on here, how to add loss with label.

tom · March 15, 2018, 9:04am

I would recommend using a mask (1 label, 0 no label) and labels 0…3 (doesn’t matter what if the mask is 0). Then you can take the loss with reduce=False and multiply with the mask, sum over classes and take the mean over the batch.

Best regards

Thomas

DavexPro · March 15, 2018, 9:06am

Yeah, I am trying this, do I need to change criterion function?

tom · March 15, 2018, 9:35am

In terms of your code above (CrossEntropyLoss seems good to me unless you have a reason not to use it):

criterion = CrossEntropyLoss(reduce=False)
...
loss = 0
for lx in range(len(outputs)):
    loss = loss + (mask[:,lx]*criterion(outputs[lx], labels[:, lx])).mean()

(So the loss with reduce=False reduces over the classes, giving you a float vector of batch_size, mask should be a float of shape batch_size x tasks).

Looking at the CrossEntropyLoss documentation, I also found an alternative to using the mask: If you put -100 in your label where you don’t have one, you can use the ignore_index feature and it’ll do the right thing for you without needing changes to your code above.

Best regards

Thomas

DavexPro · March 15, 2018, 9:39am

Wow, that will be extremely helpful! I will try it now, and feedback later!

Thanks!

DavexPro · March 16, 2018, 3:02am

From some points, it works but the loss is high compared to Keras with same structure, what can it be? The output block problem?

(
  ...
  (conv2d_7b): BasicConv2d(
    (conv): Conv2d(2080, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True)
    (relu): ReLU()
  )
  (avgpool_1a): AvgPool2d(kernel_size=8, stride=8, padding=0, ceil_mode=False, count_include_pad=False)
  (last_linear): RetBlock(
    (liner_nose): Linear(in_features=1536, out_features=4, bias=True)
    (liner_tail): Linear(in_features=1536, out_features=5, bias=True)
    (liner_hand): Linear(in_features=1536, out_features=4, bias=True)
    (liner_leg): Linear(in_features=1536, out_features=4 bias=True)
  )
)

My code for output block:

class RetBlock(nn.Module):

    def __init__(self):
        super(RetBlock, self).__init__()
        in_features = 1536
        self.liner_nose = nn.Linear(in_features, 4)
        self.liner_tail = nn.Linear(in_features, 5)
        self.liner_hand = nn.Linear(in_features, 4)
        self.liner_leg = nn.Linear(in_features, 4)

    def forward(self, x):
        x_nose = self.liner_nose(x)
        x_tail = self.liner_tail(x)
        x_hand = self.liner_hand(x)
        x_leg = self.liner_leg(x)
        
        return x_nose, x_tail, x_hand, x_hand, x_leg

model_ft.last_linear = RetBlock()

Any wrong 0.0

Dave

Luoxd1996 · July 19, 2018, 12:25pm

hi Dave,Do you have solved this problem?Could you put your code in github?I want to learn from it.
Best wish！

_Xiaolong · February 13, 2019, 2:56am

Hi, Dave, would you please share your codes? thx

gaily_sun · July 30, 2019, 3:38pm

you can use label -1 index no label,
set criterion = nn.CrossEntropyLoss(ignore_index=-1, reduction=“mean”)
I want to know, for multi-task, the label is a label list, so how to create your Dataset, Could you please show me your defined Dataset? Thank you very much~

chrisby · November 18, 2020, 6:32pm

Shameless plug: I wrote a little helper library that makes it a little easier to do multi-task learning: torchMTL. It should work for your example and makes it easy to combine the losses while keeping control over the training loop. I thought it might be of interest for people who are running into similar issues.