Split Perceptual Loss Model and Inference Model over multiple GPUS

Hi everybody,

I am trying to implement some inpainting model, where I use the Perceptual Loss.
The training routine works fine on a single GPU, but I run into a CUDA out of Memory Error, when using Multiple GPUS (with same batch size)
This is the code I am using.

device = torch.device("cuda:0" if use_cuda else "cpu")

NET = DeFINe()
vgg16_partial = Vgg16()

if cuda_device_count > 1:
    print("Use", cuda_device_count, "GPUs!")
    NET = torch.nn.DataParallel(NET)
    vgg16_partial = torch.nn.DataParallel(vgg16_partial)

NET.to(device)
vgg16_partial.to(device)

Is there any example on how to solve this kind of problem?
My goal is to somehow split the models over all the devices.
Best regards and thank you!

nn.DataParallel might use more memory on the default device as described here.
The blog post also mentions some workarounds.
Alternatively, have a look at nn.DistributedDataParallel.