Optimizing losses on different GPUs

I want to train two different models which are too large to be put on the same GPU. But actually the problem is that I need to optimize their total loss at the same time and if they’re on different GPUs, I have trouble in implementing the backpropagation. I have tried different approaches but they didn’t work out. Below is a draft of my experiment architecture. Any suggestion is appreciated. Thanks !!


Your problem seems to be the same as this.

Thanks for your reply. I’m now moving all losses to GPU 0 and then doing backpropagation. And it seems to work as all parameters in both Model1 and Model2 would update. But I notice that the parameters in Model2 only update slightly compared to Model1. I’m not sure whether the backpropagation is influenced in Model2 due to the location of model losses or it’s all because of the way I’m using the loss. Please let me know if this works for you.