Run Pytorch on Multiple GPUs

(Andre) #1


Just a noobie question on running pytorch on multiple GPU.
If I simple specify this:

device = torch.device("cuda:0"),

this only runs on the single GPU unit right?

If I have multiple GPUs, and I want to utilize ALL OF THEM. What should I do?
Will below’s command automatically utilize all GPUs for me?

    use_cuda = not args.no_cuda and torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")


Wrapping your model in nn.DataParallel is an easy way to use your GPUs.
Have a look at the parallelism tutorial.

(Andre) #3

Yes, I have browsed through the topic. But I didn’t find info answering the multiple GPUs question


This tutorial might explain it better. Let me know, if this helps you.

(KanZa ) #5

Hi @ptrblck

I am trying to run this project: with multiple GPUs. It stops after uploading the videos. Any suggestion?


Does this code run with a single GPU?
If so, could you try to set num_workers=0 for the DataLoaders in the multi GPU setup and try it again?

(KanZa ) #7

Yes it is working with single GPU but when the training are with resnet18 architecture and less batch size not with resnet152 and original batch size described by author.


So using a single GPU the code also get’s stuck for resnet152 and the original batch size?

(KanZa ) #9

yes you are getting me correctly.

(KanZa ) #10

Do you mean in author is using dataloader for val_loader and train_loader which is like this:

batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True)

I shall change both val_loader and train_loader like this:

batch_size=args.batch_size, shuffle=False,
num_workers=0, pin_memory=True)

Sorry I am new in deep learning and Pytorch


Yes, I meant exactly this line of code. :wink:
Could you try that, although your error seems to be a bit strange as resnet18 is running while resnet152 gets stuck.

(KanZa ) #12

The author has used Tesla P100 GPU (FYI).

It has given error I have also tried with both of these too with changes you have described.

model = torch.nn.DataParallel(model, device_ids=None).cuda()

model = torch.nn.DataParallel(model, device_ids=args.gpus).cuda()


Your GPU is out of memory. Probably the model is just too large for your GPU.
You could try to use torch.utils.checkpoint to trade compute for memory.

(KanZa ) #14

OK Thank you so much for your help.