Split Perceptual Loss Model and Inference Model over multiple GPUS

b4shy · December 20, 2019, 2:28pm

Hi everybody,

I am trying to implement some inpainting model, where I use the Perceptual Loss.
The training routine works fine on a single GPU, but I run into a CUDA out of Memory Error, when using Multiple GPUS (with same batch size)
This is the code I am using.

device = torch.device("cuda:0" if use_cuda else "cpu")

NET = DeFINe()
vgg16_partial = Vgg16()

if cuda_device_count > 1:
    print("Use", cuda_device_count, "GPUs!")
    NET = torch.nn.DataParallel(NET)
    vgg16_partial = torch.nn.DataParallel(vgg16_partial)

NET.to(device)
vgg16_partial.to(device)

Is there any example on how to solve this kind of problem?
My goal is to somehow split the models over all the devices.
Best regards and thank you!

ptrblck · December 20, 2019, 6:05pm

nn.DataParallel might use more memory on the default device as described here.
The blog post also mentions some workarounds.
Alternatively, have a look at nn.DistributedDataParallel.