Training 3DU-Net in multiple GPUs

3DU-net Code:

class UNet(nn.Module):
    def __init__(self):
        super(UNet, self).__init__()
        self.c1 = convBlock(1, 64).to('cuda:0')
        self.d1 = downSample(64).to('cuda:0')
        self.c2 = convBlock(64, 128).to('cuda:0')
        self.d2 = downSample(128).to('cuda:0')
        self.c3 = convBlock(128, 256).to('cuda:0')
        self.d3 = downSample(256).to('cuda:1')
        self.c4 = convBlock(256, 512).to('cuda:1')
        self.d4 = downSample(512).to('cuda:1')
        self.c5 = convBlock(512, 1024).to('cuda:1')
        self.u1 = upSample(1024).to('cuda:1')
        self.c6 = convBlock(1024, 512).to('cuda:1')
        self.u2 = upSample(512).to('cuda:1')
        self.c7 = convBlock(512, 256).to('cuda:1')
        self.u3 = upSample(256).to('cuda:1')
        self.c8 = convBlock(256, 128).to('cuda:1')
        self.u4 = upSample(128).to('cuda:0')
        self.c9 = convBlock(128, 64).to('cuda:0')
        self.out = nn.Conv3d(64, 1, 3, 1, 1).to('cuda:0') = nn.Sigmoid().to('cuda:0')

    def forward(self, x):
        L1 = self.c1('cuda:0'))
        L2 = self.c2(self.d1('cuda:0')).to('cuda:0'))
        L3 = self.c3(self.d2('cuda:0')).to('cuda:0'))
        L4 = self.c4(self.d3('cuda:1')).to('cuda:1'))
        L5 = self.c5(self.d4('cuda:1')).to('cuda:1'))
        R4 = self.c6(self.u1('cuda:1'),'cuda:1')).to('cuda:1'))
        R3 = self.c7(self.u2('cuda:1'),'cuda:1')).to('cuda:1'))
        R2 = self.c8(self.u3('cuda:1'),'cuda:1')).to('cuda:1'))
        R1 = self.c9(self.u4('cuda:0'),'cuda:0')).to('cuda:0'))


convBlock, downSample, upSample is layer in my own code.

I want to train 3DU-Net, but the GPU memory is not enough, so I want to use multiple GPUs to train this model.

I assign different U-net layers to different GPUs.

I want to ask if this is the correct way to use different GPUs to train models? And what’s the best way to run multiple GPU training python scripts using the PyTorch module in the Linux server?