Custom forward using multiple GPUs

Hi everyone :slight_smile:

I am trying to run my code on multiple GPUs. According to this tutorial, this is as easy as passing my model into a function with the corresponding GPU IDs I would like to use.

However, if I have a model that uses a custom forward method with a for loop, will that be handled correctly by the multiple GPUs?

Also: where do I send the images and labels from the batches to? When using a single GPU I do something like:

for batch in loader:
   # unpack a batch and throw the images and labels on a specific device
   images = batch[0].to(device)
   labels = batch[1].to(device)

How does that look like for multiple GPUs?

Any help is very much appreciated!

All the best

You can pass anything as you want. The only constrain is that any operation carried out must happen in the same device. You can have hybrid forward and mix everything as you want.

Thank you @JuanFMontesinos!

But how do I specify certain (multiple) GPUs as my device? Say I have access to 8 GPUs but can only run my code on GPU 7 and 8 because the other ones are occupied. How do I tell PyTorch to just use GPU 7 and 8 and then also grab the images and labels from the corresponding GPU?

There are two options:

  1. You want to run independent process en each GPU (like two training pipes in parallel or so).
  2. You want to use several gpus to train a single experiment.

I think you are asking for the case 2.
Soo there are two options in here. The easiest one is using a cuda environment variable.
This enviroment variable can be used for any process in your OS. What it does is masking the devices such that the called process can only see the gpus that you allow.
Another option is letting the process to see the 8 gpus and choose which ones you want to parallelize over.
The syntax of dataparallel is:

torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0)

You can pass device_ids=[7,8]
The former case is preferred since there is less chance you mess it up.
Note that when you call cuda_visible_devices=7,8, pytorch will only see two gpus.
Thus the indices for those will be (inside python) 0,1 instead of 7,8

In short this module automatically allocates your inputs to the model splitting the batch in as many chunks as gpus you have chosen.
So you don’t really need to take care of allocating them. You just need to ensure that everything inside your nn.Module has been written soft-coding devices. This is, if you need to create a tensor of zeros,
you do something like
instead of

Yes you are correct, I am asking for the case 2. :slight_smile:

(1) So, I am running a jupyter notebook on a remote server, is it possible to then say something like?

CUDA_VISIBLE_DEVICES=7,8 jupyter notebook --no-browser

(2) Also, do I understand it correctly that the above command will make sure that my notebook accesses the GPUs that have index 7 and 8 on the server (let’s say I have 9 GPUs then, otherwise there is no index 8 :wink: ), which I can check with e.g. nvidia-smi but inside my notebook they will have index 0 and 1?

(3) And then I use the command:

torch.nn.DataParallel(model, device_ids=0,1, output_device=None, dim=0)

to access the previously defined GPUs?

(4) And how do I need to define the device inside my notebook? When using a single GPU I can say:
device = torch.device('cuda') and then send everything to it with
What do I have to put in the brackets instead of ‘cuda’?

Sorry for all the questions, I am relatively new to using multiple GPUs on a remote server :see_no_evil: