In my program, I have to build two different model for training . However, my cuda memory would be overflowed directly . So I want to distribute training in different GPUs for different models. My final loss contains of two parts, one part is independent and other is joint.
In your code snippet you’ll most likely get an error stating some tensors are not on the same device.
Since outputA and outputB are on GPU0 and GPU1, respectively, you should push them to the same device.
Could you try the following:
...
lossC = torch.nn.CosineSimilarity(outputA, outputB.to('cuda:0')) # lossC is now on cuda:0
final_loss = lossA + lossB.to('cuda:0') + lossC
Yes , you are right . But the lossB have been moved to GPU 0. I wonder that could affect gradient of modelB in backward operation . Could modelB parameters be updated synchronously ?