Suppose I have a model A and a model B,
I want to make the model A and B like a nn.Sequatial() and A and B are in different GPUs, but a single GPU memory is not enough.
X.cuda(0)
A.cuda(0)
B.cuda(1)
Y1 = A(X)
Y2 = Y1.cuda(1)
Z = B(Y2)
Can ‘Z.backward()’ go like Z->Y2->Y1->X ?
And what about optimizers?