DistributedDataParallel (DDP) and Multi-task Learning (MTL) where model outputs might not contribute to the loss

A similar question has been asked before: Process got stuck when set find_unused_parameters=True in DDP - #3 by oliver_ss and the solutions seems similar to what I have done by faking a forward pass. However, it remains to be said if this approach is sound and good.

@zalador what do you mean if this approach is sound and good?