Train model on GPU and CPU

Hi, I have a model with two components, one on gpu and the other on cpu.
Let’s say Loss_1 is from gpu, Loss_2 is from cpu. And I do Loss = Loss_1 +'cuda'). I’m wondering how the loss.backward() compute since there are two device?


The .to() operator is differentiable (just move values to the right device). And so the backward will work exactly the same as if everything was on a single device. The backward ops will run on the same device as the forward ones.