CUDA_VISIBLE_DEVICE is of no use

Liang · November 16, 2017, 4:37am

I have a 4-titan XP GPU server. When i use os.environ[“CUDA_VISIBLE_DEVICES”] =“0,1” to allocate GPUs for a task in python, I find that only GPU 0 is used. And there is out of memroy problems even GPU 1 is free.
Should I allocate memory to different GPUs myself?

colesbury · November 16, 2017, 5:31am

Use CUDA_VISIBLE_DEVICES (not “DEVICE”). You have to set it before you launch the program – you can’t do it from within the program.

Liang · November 16, 2017, 7:13am

My bad, there is a typo in my post. But in my code, when i use
os.environ[“CUDA_VISIBLE_DEVICES”] =“1,2”
, only GPU 1 is used. At least, such a line in Python has its own effect. It can control the use of GPUs.
However, It is supposed to make GPU 1 and 2 available for the task, but the result is that only GPU 1 is available. Even when GPU 1 is out of memory, GPU 2 is not used. Is there any other switches controlling parallel computing between two GPUs?
BTW, another question: Does Pytorch tend to use GPUs one by one or allocate equal memories to GPUs

ptrblck · November 16, 2017, 8:18am

You can push your data to a specific GPU using .cuda(gpu_id). E.g. you can load a generator network on one GPU and the discriminator to the other.
Another option is to use the DataParallel module.

Liang · November 16, 2017, 8:59am

Do you mean that if I want to use two GPUs at the same time, i have to change my source codes and add parallel-related codes?
Or it would just use one GPU at most if not specified.

ptrblck · November 16, 2017, 11:30am

Basically it is just one line to use DataParallel:

net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
output = net(input_var)

Just wrap your model with DataParallel and call the returned net on your data.
The device_ids parameter specifies the used GPUs.

Liang · November 18, 2017, 3:14am

It doesn’t work for me. Still, only the first GPU is used. Is there any more tricks?

Liang · November 18, 2017, 7:20am

Does data parallel only support more than batch=1? Actually, I only use batch=1.

SimonW · November 18, 2017, 7:21am

You need to use >=#gpu batch set to apply data parallel. As its name suggests, data parallel just pushes computation for different data in a batch to different gpus.

MrTuo · December 11, 2018, 7:17am

Hi! I’m adding os.environ['CUDA_VISIBLE_DEVICES'] = "2" in my code does not work, the code always select first GPU, however CUDA_VISIBLE_DEVICES=2 python train.py works.
I find that os.environ['CUDA_VISIBLE_DEVICES'] in this code is work. Do you know why?

pjavia · December 13, 2018, 11:19pm

@MrTuo This is how pytorch 0.4.1 convention works. If you say CUDA_VISIBLE_DEVICES=2, 3. Then for pytorch GPU - 2 is cuda:0 and GPU - 3 is cuda:1. Just check your code is consistent with this convention or not?

mbanani · June 26, 2019, 3:25pm

I had this same issue where setting CUDA_VISIBLE_DEVICES=2 python train.py works but setting os.environ['CUDA_VISIBLE_DEVICES'] = "2" didn’t. The cause of the issue for me was importing the torch packages before setting os.environ['CUDA_VISIBLE_DEVICES'], moving it to the top of the file before importing torch solved it. Hope this helps.

RIVERS · January 9, 2020, 8:24am

That’s helpful for me, thanks 3000 times

cheng_xie · February 22, 2020, 2:13am

thank you, it works.

MattBC · May 4, 2021, 1:27am

Hey, I have the opposite problem: code is using both of my GPUs by default, no matter what I do. These are different GPU models and I DO NOT want to use them for parallel processing. I’ve tried setting GPU #0 with cuda_visible_devices, tried setting it with torch, moved it to beginning of code, nothing is working.

Just for the record, I am doing deep learning object detection importing arcgis and torch. Everything else seems to work fine now, until I try to test learning rate and it tells me my GPUs are imbalanced and that I should exclude GPU #1. I never wanted GPU #1 to be utilized in the first place.

EDIT: Nevermind, it appears to be working now after I did move it towards beginning of code. I guess I reset the kernel somehow, which made it work. I’m just a rookie

Lame · October 17, 2021, 9:46am

When I tried this solution (I have two gpu), it shows an error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

ptrblck · October 18, 2021, 12:30am

I’m not sure which solution you are referring to, but the error could be raised, if you manually specify a device inside the model.
Could you post an executable code snippet, which would reproduce the issue, so that we could debug it, please?

saurbhc · August 13, 2022, 8:24pm

Saved 30 hours of my life. Thanks a ton.

Arthur_Conmy · October 15, 2022, 4:17am

It’s also possible to run into this with bad conda environments.

For me

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2" # just use one GPU on big machine
import torch
assert torch.cuda.device_count() == 1

Failed, but it was because my environment was problematic, and only

import torch 
print(torch.cuda.current_device())

actually raised an error.