MultiGPU forward pass

So, Im using the iPython Notebook, and doing something like:

%%bash
CUDA_VISIBLE_DEVICES=1,2,3
netG = torch.nn.DataParallel(netG, [2,3,4])
netD = torch.nn.DataParallel(netD, [2,3,4])

%%bash
gpustat
%%bash
gpustat
ip-172-31-27-23  Fri Oct 20 02:38:19 2017
[0] Tesla K80        | 56'C,   0 % |   879 / 11439 MB | ubuntu(283M) ubuntu(288M) ubuntu(303M)
[1] Tesla K80        | 35'C,   0 % |     2 / 11439 MB |
[2] Tesla K80        | 46'C,   0 % |   206 / 11439 MB | ubuntu(204M)
[3] Tesla K80        | 42'C,   0 % |   206 / 11439 MB | ubuntu(204M)
[4] Tesla K80        | 47'C,   0 % |   206 / 11439 MB | ubuntu(204M)
[5] Tesla K80        | 38'C,   0 % |     2 / 11439 MB |
[6] Tesla K80        | 47'C,   0 % |     2 / 11439 MB |
[7] Tesla K80        | 41'C,   0 % |     2 / 11439 MB |

Gives me:

RuntimeError: all tensors must be on devices[0]

Is there a way around this in the notebook?

I’m confused. If you want PyTorch to only see gpus 1,2,3. How can it do dataparallel on 2,3,4?

Did you run your cuda settings in one line like this:

CUDA_VISIBLE_DEVICES=1 python myscript.py

Could it be that you typed it before calling your python script, so that it has no effect and your models are still pushed on GPU 2, 3 and 4. This would explain the same memory amount of these GPUs.

However, if you really want to use only GPUs 1, 2 and 3, place the setting in one line and use

netG = torch.nn.DataParallel(netG, [0, 1, 2])
netD = torch.nn.DataParallel(netD, [0, 1, 2])

Could you check in which line the error was thrown and paste this code snippet?
You have to make sure, that all operations are executed on the same GPU id.

For example if your generator and discriminator are placed on different GPUs, you have to push the Variables to the according GPU:

netG = netG.cuda(0)
netD = netD.cuda(1)

outG = netG(Variable(data.cuda(0)))
outG = outG.cuda(1) # push to GPU 1

outD = netD(outG)

etc.

I hope it helps!

Im running in a notebook… so there’s no notion of running the script like that. I see what the issue might be there. But I think I handle that. What’s happening for me when I use DataParallel is that the first N-1 GPUs have memory allocated to them, but the program keeps waiting for the Nth GPU to have the requisite memory allocated… Not sure if thats a common issue?

Well, if you are running your script in a notebook, you can also add this line at the beginning of your script (before importing torch etc.):

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1, 2"

So thats what I did - except for some reason the utilization never goes above 0. I couldn’t really figure that out

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5"
[0] Tesla K80        | 49'C,   0 % |  1356 / 11439 MB | ubuntu(650M) ubuntu(701M)
[1] Tesla K80        | 41'C,   0 % |   214 / 11439 MB | ubuntu(210M)
[2] Tesla K80        | 47'C,   0 % |   214 / 11439 MB | ubuntu(210M)
[3] Tesla K80        | 42'C,   0 % |   214 / 11439 MB | ubuntu(210M)
[4] Tesla K80        | 47'C,   0 % |   214 / 11439 MB | ubuntu(210M)
[5] Tesla K80        | 42'C,   0 % |   214 / 11439 MB | ubuntu(210M)
[6] Tesla K80        | 42'C,   0 % |     2 / 11439 MB |
[7] Tesla K80        | 36'C,   0 % |     2 / 11439 MB |
if typ == 'RNN':
    netD = torch.nn.DataParallel(netD, list(range(6)))