is there a tutorial on the use of multiple GPUs? For now I have only seen a small tutorial that says to wrap the module using DataParallel, but is that all one needs to do? Just write the model normally and then call
model = DataParallel(model).cuda()?
In the imagenet example, I have seen the use of distributed sampler when loading the training data. Is that something we need to care about?
There’s a little more nuance to it if you want to control the exact GPUs that you parallelize over. If you look at the doc for DataParallel you’ll see that you can specify device_ids. If you do that, you’ll also want to make sure you load all of your variables onto the same GPU to start with (with your_variable.cuda(device_id=ID). That should be pretty much it.
Thank for your answer! It’s not clear to me what you mean with
“If you do that, you’ll also want to make sure you load all of your variables onto the same GPU to start with”
Which GPU are you talking about? I have, say, 3 of them, that I want to use. I have the input to the network; should I just call
input = input.cuda()
model = DatsParallel(model) #I want to use all available GPUs anyhow
output = model(input)
or do something else?
Yes, basically if you have 3 GPUs but only want to use 2 of them then you’d have to specify which ones you want to use. Otherwise you can call cuda() on the model and the Variables. Do keep in mind that unless you’re running this on a dedicated headless server, one of your GPUs may be tied up displaying your desktop etc. so you might get strange errors.
So I can specify multiple GPUs in the .cuda() call? Because if I have to specify only one, which one should I specify?
In any case I usually use all the GPUs, and if I want to restrict pytorch access I just set CUDA_VISIBLE_DEVICES to the devices I want to use
Yes you can pass in a list of GPU ids. Take a look at the API for details.