Hello,
I am training a multi-task learning model. Due to our data collection strategy, I have several datasets which corresponds to different output branches of model.
I want to apply DDP to accelerate the training process, maybe the standard approach is to use DistributedSampler on each dataloader. I wonder if it is possible to do the following: on each GPU train one specific dataset with its own loss function, and aggregate the losses to update the model.
If possible, are there any tutorials I can follow?
Thanks for your patient and help!