Run Pytorch on Multiple GPUs


(Andre) #1

Hello

Just a noobie question on running pytorch on multiple GPU.
If I simple specify this:

device = torch.device("cuda:0"),

this only runs on the single GPU unit right?

If I have multiple GPUs, and I want to utilize ALL OF THEM. What should I do?
Will below’s command automatically utilize all GPUs for me?

    use_cuda = not args.no_cuda and torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")

#2

Wrapping your model in nn.DataParallel is an easy way to use your GPUs.
Have a look at the parallelism tutorial.


(Andre) #3

Yes, I have browsed through the topic. But I didn’t find info answering the multiple GPUs question


#4

This tutorial might explain it better. Let me know, if this helps you.


(KanZa ) #5

Hi @ptrblck

I am trying to run this project: https://github.com/chaoyuaw/pytorch-coviar/blob/master/train.py with multiple GPUs. It stops after uploading the videos. Any suggestion?


#6

Does this code run with a single GPU?
If so, could you try to set num_workers=0 for the DataLoaders in the multi GPU setup and try it again?


(KanZa ) #7

Yes it is working with single GPU but when the training are with resnet18 architecture and less batch size not with resnet152 and original batch size described by author.


#8

So using a single GPU the code also get’s stuck for resnet152 and the original batch size?


(KanZa ) #9

yes you are getting me correctly.


(KanZa ) #10

Do you mean in train.py author is using dataloader for val_loader and train_loader which is like this:

batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True)

I shall change both val_loader and train_loader like this:

batch_size=args.batch_size, shuffle=False,
num_workers=0, pin_memory=True)

Sorry I am new in deep learning and Pytorch


#11

Yes, I meant exactly this line of code. :wink:
Could you try that, although your error seems to be a bit strange as resnet18 is running while resnet152 gets stuck.


(KanZa ) #12

The author has used Tesla P100 GPU (FYI).

It has given error I have also tried with both of these too with changes you have described.

model = torch.nn.DataParallel(model, device_ids=None).cuda()

model = torch.nn.DataParallel(model, device_ids=args.gpus).cuda()


#13

Your GPU is out of memory. Probably the model is just too large for your GPU.
You could try to use torch.utils.checkpoint to trade compute for memory.


(KanZa ) #14

OK Thank you so much for your help.


#16

Hi,
I have a look at data_parallel_tutorial and parallelism_tutorial.
And I have write my code, but fail to use GPUs.

if opt.gpus = [0, 1]

class Model(nn.Module):
  def sample(self, input):
    input = input.cuda()
    output = self.other_function(input)
    return output

if len(opt.gpus) > 1:
    print("Let's use", len(opt.gpus), "GPUs!")
    os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(str(x) for x in opt.gpus)
else:
    torch.cuda.set_device(opt.gpus[0])
torch.cuda.manual_seed(opt.seed)

model = Model()
if len(opt.gpus) > 1:
    model = nn.DataParallel(model, device_ids=opt.gpus, dim=0)
if use_cuda:
    model = model.cuda()
 
model.train()
out = model.module.sample(data)

data.shape : (Batch_size, time_step)

However, model only use the gpus 0 when I run my code .
I don’t know why ?
Could you explain it ?
Thanks.


#17

How large is your batch size?
Using nn.DataParallelyour data will be split in dim0 and each chunk of data will be send to a device.
If the batch size is too small, some GPU won’t be able to get a data chunk.


#18

Thank you for the quick reply (appreciated).
My batch_size = 8,
because the length of train_data sample is very long.
I will get the error message (out of memory ) when I only use one gpu.
So, I plan to use the multi-gpus to train my model.
However, the model only use the gpus 0 when gpus= [0, 1].

In general, batch_size be equal to ?
Could you give me some advice, thank you.


#19

If your batch size is 8, each GPU should get 4 samples each.
Do you see the GPU1 completely empty without any utilization in nvidia-smi or a similar tool?


#20

Sorry, I have been late because of the time zone.

I use the watch -n 1 -d nvidia-smi when I run my model.
However, I see the the GPU1 completely empty.
I am confused why this happens.
Can you explain it in your experience?
Thanks.

Besides,
I output the data tensor.shape(time_step, Batch_size_data, embedding_dim)
before out = model.module.sample(data).
And I also output the output tensor.shape(time_step, Batch_size_output, hidden_dim) after the other_function.

class Model(nn.Module):
        def sample(self, input):
               input = input.cuda()
               output = self.other_function(input)
               return output

However, the Batch_size_data == Batch_size_output.
In theory, I think Batch_size_data should be equal to 2*Batch_size_output when using the GPU0 and GPU1.

Note that I modified parameter setting
model = nn.DataParallel(model, device_ids=opt.gpus, dim=1)


#21

I found that just modifying the function name will solve my problem.

I changed this part:

class Model(nn.Module):
  def forward(self, input):
    input = input.cuda()
    output = self.other_function(input)
    return output
model.train()
out = model(data)