Hi Tony-Y,
Your example is great. I am a beginner in pytorch. I am using multi-task approach for two different task and want to adopt this approach. I have a resnet50 as backbone and added two branch fro two different task. Now I want to use this Multitask loss for these two task. Can you please briefly show How do i use your MultiTaskLoss class in my case? Below is my code
model= multi_output_model(pretrainedImgnetModel,cup_nodes,bbc_type_nodes)
criterion = [nn.CrossEntropyLoss(),nn.CrossEntropyLoss()] # two loss function for two task
First of all, you need to check the document of nn.CrossEntropyLoss. F.softmax should not be applied to y1o because it is included in CrossEntropyLoss. In addition, you should confirm whether the application of sigmoid to y2o is appropriate.
Hi Tony-Y,
My branch y1o is for multi label classification and y2o is for binary classification. So to get values between 0 to 1, sigmoid for y2o and for summing up all ouput probabilities to 1 for multi label using softmax to y1o.
Please correct me If I am wrong.
Thanks a lot Tony. I have read it. But for Binary classification BCELoss and CrossEntropy Loss should be same. Is not it? In that case my code should be ok ?
It would be nice if you could little bit explain How I can adopt MultiTaskLoss for my case
I think we should not use torch.nn.XX or torch.nn.functional.XX to get losses in the forward function. For those who are stuck here because losses do not change, I have reimplemented the example from the author of the paper using PyTorch: PyTorch Exmple.
I wrote an example code and it seemed to be working.
It might be the key to make optimizers recognize the learnable parameters (multi task loss’s sigmas).
Yes, that’s right.
If you have loss1, loss2, and loss3, which are cross entropy loss, cross entropy loss, and MSE loss respectively, you should pass “is_regression = torch.Tensor([True, True, F
alse])” for the constructor.
I’d like to hear whether this multi task loss implementation works in your setting, too.
Dear all,
thanks for your hints on MTL loss. I am experimenting with MTL loss since a while now and I am facing some problems. I am in a situation where I am doing image to image regression with two different losses. After trying (many) different things I found that initializing sigmas to one gives the best tradeoff between performance across the two different tasks.
The problem is that one sigma is increasing and the other is decreasing. Can this be considered normal? Since my experiment is GPU intensive, I am experimenting with a short training schedule for prototyping. However, the increasing sigma lead to much worse performance when trying the full training schedule, leading to complete different (very bad) results.
I tried implementing @Tony-Y’s MultiTaskLoss() example above to balance the 3 loss terms in our https://github.com/ultralytics/yolov3 repo, but no luck. The total loss did reduce, but 90% of the loss component was made up of the self.eta constants. Our 3 balance parameters ended up between 1.0 and 2.0. One thing I realized may be missing is a constraint that all loss weights sum to 1 for example, but I did not explore further.
We have used hyperparameter evolution to successfully balance our loss terms (at great GPU expense), but this evolves to a specific task (i.e. YOLOv3 COCO with 80 classes), making it a suboptimal solution for loss balancing for people adopting the repo for their custom datasets.
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
Hey Stefano - maybe cannot be considered normal if you are getting very bad results when trying the full training schedule. Did you try using sigma and 1-sigma instead? So there is only one learnable parameter instead of two separate sigmas?
Thank Stuart for your suggestion!
I haven’t tried exactly that, but I will give it a try! After some time what i noticed is that I was getting better results with the “short” prototyping schedule because both variances where slighlty increasing (thus decreasing both losses). This made so I got better (almost acceptable) results while prototyping, but bad results when tried entirely.
My guess is that my two tasks are correlated and strongly depend on the network training bootstrap phase, somehow invalidating the hypotheses of the MTL paper. Maybe task independence necessary for log likelihood estimation? I am just guessing
Yeah give that a go, it could sort the problem, I am interested to hear if it does! If you think the two tasks are not independent, it might be possible to test that theory. If they are very tightly linked, perhaps the internal learned feature representations (or even the intial raw inputs) for one task chould be good for the other. I don’t know all the details of your use case but if you tried to use features from one task with the targets from your other task, that could be useful insight.