Can I specific GPU ,not GPU:0

Kevinkevin189 · December 13, 2018, 1:59pm

4 GPUs on my machine ,GPU 0 and 1 is running other’s code with nearly full memory usage.

So I use GPU 2 and 3. But when I run it ,it still reports

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 10.92 GiB total capacity; 10.21 GiB already allocated; 89.50 MiB free; 9.64 MiB cached)

Here I post my dataparallel code:

os.environ['CUDA_DEVICE_ORDER']='PCI_BUS_ID'
os.environ['CUDA_VISIBLE_DEVICES']='2,3'
#this function is listed below
model=DataParallelModel(model,cuda=0,device_ids=[0,1],output_device=0)
device=torch.device('cuda:0')

#This DataParallelModel function wrap the DataParallel class 
def DataParallelModel(model,**kwargs):
        if 'device_ids' in kwargs.keys():
            device_ids = kwargs['device_ids']
        else:
            device_ids = None
        if 'output_device' in kwargs.keys():
            output_device = kwargs['output_device']
        else:
            output_device = None
        if 'cuda' in kwargs.keys():
            cudaID = kwargs['cuda']
            device=torch.device('cuda:{}'.format(cudaID))
            model = torch.nn.DataParallel(model, device_ids=device_ids, output_device=output_device).to(device)
        else:
            model = torch.nn.DataParallel(model, device_ids=device_ids, output_device=output_device).cuda()
        return model

These environ code do works,they makes my code sees gpu 2 and 3.But it still fail for such error mentioned before.
My Pytorch version is the newly released 1.0, I just upgraded it yesterday morning.

JuanFMontesinos · December 13, 2018, 2:20pm

If you don’t need to specify GPU ordering for any special reason just use

model=DataParallelModel(model)

It’s agnostic to device ids

Kevinkevin189 · December 13, 2018, 3:16pm

I modified, it still not work
I changed all “to(device)” to “cuda()” and model=DataParallelModel(model)
It reports

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 10.92 GiB total capacity; 10.21 GiB already allocated; 89.50 MiB free; 9.64 MiB cached)

JuanFMontesinos · December 13, 2018, 3:39pm

If it’s the case, are you sure you are choosing the correct devices? you can see if you can manually allocate tensors
a=torch.rand(100).cuda(idx) manually changing idx

and check if the idx match the nvidia-smi or in your case the pci order

Kevinkevin189 · December 13, 2018, 4:23pm

The result is as shown:
without os.environ[DEVICE_ORDER]=PCI_BUS_ID,torch.cuda performs same device order as nvidia-smi order.
with this env variable, the cuda and nvidia-smi order is changed

cuda(0)----gpu2
cuda(1)----gpu3
cuda(2)----gpu0
cuda(3)----gpu1

and the same error reports.

Kevinkevin189 · December 13, 2018, 4:25pm

the same error means ,I commented the PCI_ORDER line to make cuda order same as nvidia-smi order,and other code unchanged, the torch.rand did create tensor on it specific gpu.But same error occured while I run the trainning code.

JuanFMontesinos · December 13, 2018, 4:57pm

Well I am afraid that if you still have these issue then it’s a bug cos CUDA_VISIBLE_DEVICES is very standard.

Kevinkevin189 · December 13, 2018, 5:42pm

U consider it a machine bug or sys bug or code bug?
I’m using ubuntu 16.04 with cuda9.0 and cudnn7.0.5, pytorch 1.0
As mentioned before ,the os env is able to make my code see the gpu and exclude others.and then ,I can use .cuda() or to() to transfer data on valid gpus and run it.Is this situation i described right?Cuz I don’t know where to debug, I need to confirm such process correctness.

JuanFMontesinos · December 13, 2018, 5:44pm

That’s theoretically correct. If you set CUDA_VISIBLE_DEVICES PyTorch is unable to see the other gpus

Therefore, and if you are sure you are not commiting a mistake assigning cuda visible devices, you are facing a either bug or a wrong library setup.

Can you check if, after setting cuda visible devices you cannot access to the others?
you can use torch.cuda.device_count()

Kevinkevin189 · December 13, 2018, 5:53pm

I 've debugged,and I found the mistake,I post it here ,hope to get a clear explanation from U cos I really puzzled why it happened here.
It my forward pass. I create 8 residual block with name res1 to res8,and in forward pass I use

#res block 
        def make_resblock(in_channels, out_channels, kernel_size, stride, padding):
            return nn.Sequential(
                nn.ReflectionPad2d(padding),
                nn.Conv2d(in_channels, out_channels, kernel_size, stride),
                nn.BatchNorm2d(out_channels),
                nn.ReLU(),
                nn.ReflectionPad2d(padding),
                nn.Conv2d(in_channels, out_channels, kernel_size, stride),
                nn.BatchNorm2d(out_channels),
                nn.ReLU()
            )
        self.res1=make_resblock(256,256,3,1,1)
#others are same
#this part in  forward pass
 for i in range(1,9):
        res = input
        input=getattr(self,'res{}'.format(i))(input)
        input=res+input

when i commented this part ,the code runs correctly.
But I’m puzzled ,Why? is there sth wrong?

Kevinkevin189 · December 13, 2018, 6:01pm

the os env sets right,is my code fault ,but I dont know the reason

JuanFMontesinos · December 13, 2018, 6:03pm

Cos maybe you are making the net small enough to fit in the residual space of the in-use gpu. That’s why I tell you to check if cuda_visible_devices works.

If it works you just wrongly assigned the devices

Kevinkevin189 · December 13, 2018, 6:10pm

I ve checked .

cuda_visible_devices works fine, it exclude the other gpu,I confirmed by create tensor on the excluded gpu order number and torch.cuda APIs.
the in-use gpu usage stay unchanged when i run the commented code(comments the residual for loop it runs),and the gpus I exposed, their usage increases from 0 to 5Gb,both two gpu have data flowed in.

Kevinkevin189 · December 13, 2018, 6:18pm

and I rewrite the for loop to small code lines, I use only res1 and res2 ,it works fine ,but with for loop code ,it fails

Kevinkevin189 · December 13, 2018, 6:25pm

I tested again, i can only use 3 res blocks, if more ,it reports an error with exactly same number and words.

RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.92 GiB total capacity; 9.63 GiB already allocated; 223.50 MiB free; 458.66 MiB cached)

no matter 4,5,6,7 or 8 blocks i use in forward pass, it report this error, and the numbers are exactly same

JuanFMontesinos · December 13, 2018, 6:26pm

Well so the logical conclusion is that the problem you reported is not related to other gpus, but the model you are using + input size is bigger than the GPU capacity