Run Pytorch on Multiple GPUs


Just a noobie question on running pytorch on multiple GPU.
If I simple specify this:

device = torch.device("cuda:0"),

this only runs on the single GPU unit right?

If I have multiple GPUs, and I want to utilize ALL OF THEM. What should I do?
Will below’s command automatically utilize all GPUs for me?

    use_cuda = not args.no_cuda and torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")

Wrapping your model in nn.DataParallel is an easy way to use your GPUs.
Have a look at the parallelism tutorial.


Yes, I have browsed through the topic. But I didn’t find info answering the multiple GPUs question


This tutorial might explain it better. Let me know, if this helps you.


Hi @ptrblck

I am trying to run this project: with multiple GPUs. It stops after uploading the videos. Any suggestion?

Does this code run with a single GPU?
If so, could you try to set num_workers=0 for the DataLoaders in the multi GPU setup and try it again?

1 Like

Yes it is working with single GPU but when the training are with resnet18 architecture and less batch size not with resnet152 and original batch size described by author.

So using a single GPU the code also get’s stuck for resnet152 and the original batch size?

1 Like

yes you are getting me correctly.

Do you mean in author is using dataloader for val_loader and train_loader which is like this:

batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True)

I shall change both val_loader and train_loader like this:

batch_size=args.batch_size, shuffle=False,
num_workers=0, pin_memory=True)

Sorry I am new in deep learning and Pytorch

Yes, I meant exactly this line of code. :wink:
Could you try that, although your error seems to be a bit strange as resnet18 is running while resnet152 gets stuck.

1 Like

The author has used Tesla P100 GPU (FYI).

It has given error I have also tried with both of these too with changes you have described.

model = torch.nn.DataParallel(model, device_ids=None).cuda()

model = torch.nn.DataParallel(model, device_ids=args.gpus).cuda()

Your GPU is out of memory. Probably the model is just too large for your GPU.
You could try to use torch.utils.checkpoint to trade compute for memory.


OK Thank you so much for your help.

I have a look at data_parallel_tutorial and parallelism_tutorial.
And I have write my code, but fail to use GPUs.

if opt.gpus = [0, 1]

class Model(nn.Module):
  def sample(self, input):
    input = input.cuda()
    output = self.other_function(input)
    return output

if len(opt.gpus) > 1:
    print("Let's use", len(opt.gpus), "GPUs!")
    os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(str(x) for x in opt.gpus)

model = Model()
if len(opt.gpus) > 1:
    model = nn.DataParallel(model, device_ids=opt.gpus, dim=0)
if use_cuda:
    model = model.cuda()
out = model.module.sample(data)

data.shape : (Batch_size, time_step)

However, model only use the gpus 0 when I run my code .
I don’t know why ?
Could you explain it ?

How large is your batch size?
Using nn.DataParallelyour data will be split in dim0 and each chunk of data will be send to a device.
If the batch size is too small, some GPU won’t be able to get a data chunk.

Thank you for the quick reply (appreciated).
My batch_size = 8,
because the length of train_data sample is very long.
I will get the error message (out of memory ) when I only use one gpu.
So, I plan to use the multi-gpus to train my model.
However, the model only use the gpus 0 when gpus= [0, 1].

In general, batch_size be equal to ?
Could you give me some advice, thank you.

If your batch size is 8, each GPU should get 4 samples each.
Do you see the GPU1 completely empty without any utilization in nvidia-smi or a similar tool?

1 Like

Sorry, I have been late because of the time zone.

I use the watch -n 1 -d nvidia-smi when I run my model.
However, I see the the GPU1 completely empty.
I am confused why this happens.
Can you explain it in your experience?

I output the data tensor.shape(time_step, Batch_size_data, embedding_dim)
before out = model.module.sample(data).
And I also output the output tensor.shape(time_step, Batch_size_output, hidden_dim) after the other_function.

class Model(nn.Module):
        def sample(self, input):
               input = input.cuda()
               output = self.other_function(input)
               return output

However, the Batch_size_data == Batch_size_output.
In theory, I think Batch_size_data should be equal to 2*Batch_size_output when using the GPU0 and GPU1.

Note that I modified parameter setting
model = nn.DataParallel(model, device_ids=opt.gpus, dim=1)

I found that just modifying the function name will solve my problem.

I changed this part:

class Model(nn.Module):
  def forward(self, input):
    input = input.cuda()
    output = self.other_function(input)
    return output
out = model(data)