MultiGPU training

exponential · August 20, 2022, 12:48am

I have 3 GPUs (80GB each) on my virtual machine, however for some reason pytorch only uses one of the GPUs.
Please see minimal reproduction of my code

import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"  
os.environ["CUDA_VISIBLE_DEVICES"]="0, 1, 2"
from torch import nn

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Network(**kwargs)
model = nn.DataParallel(model)
model.to(device)

trainloader = DataLoader(ImagesDataset('/ResizedImages', image_size), batch_size=batch_size,shuffle=True, num_workers=int(num_workers))

for i in range(EPOCHS):
        for img in trainloader:
            img_ = img.to(device)
            loss = model(img_)
            opt.zero_grad()
            loss.backward()
            opt.step()

Screenshot 2022-08-20 014333

When I run nvidia-smi after training, I get the output screenshot- meaning my training only uses 1 GPU. Is this due to the size of my network being minimal and thus training does not require that large memory (80GB x 3) or I am getting something wrong?

Many thanks!

ptrblck · August 20, 2022, 1:36am

nn.DataParallel creates an imbalance on the GPUs and while your seem to be quite large, the other devices could still be used.
You can add debug print statements to the forward method of the model and check the current device of the input tensor via print(x.device) to check which GPU the inputs use.
Generally we recommend to use DistributedDataParallel for the best performance and to avoid the memory imbalance.

exponential · September 13, 2022, 9:54am

Hi ptrblck, thanks
You were right about the imbalance created by nn.DataParallel
I later had to resort to using ddp.