Using a combined loss to update two different models

Yes, the error only occurs when using multiple GPUs. I’m so sorry, but it’s so hard to create a minimal code for this. I use the fairseq framework and there are a lot of pieces attached to it.

If you don’t mind, let’s keep discussing this in the question’s thread.