Best practice for multiple losses on different parts of batch

I have a pose estimation model which I want to train. Some instances have 3d ground truth data, and some have 2d ground truth data. The prediction model is the same for both types, only the loss differs. They are loaded by different dataloaders.

I want both to be in the same batch, what is the (software engineering) best practices for achieving this.

  1. Pass them both separately though the model, so two model forward and backward calls, then run the optimizer on the accumulated gradient.
  2. Combine both batches (track how many samples in each), feed them through the model, and then take apart the result before putting it through the loss function.
  3. Calculate both losses, on both batches, with dummy ground truth and multiply with 1 or 0 to only include the valid items in the overall loss.

What approach is likely to be the easiest to understand, are there any recipes on how best to approach this problem?

cross post with: machine learning - Best practice for multiple losses on different parts of batch - Stack Overflow