Os.environ ["CUDA_VISIBLE_DEVICES"] not functioning

I have a total of 4 GPUs. I want to use the gpu no. 2 for my experiments. on the top of the code I set
os.environ["CUDA_VISIBLE_DEVICES"]='2' but I see that I am still using GPU no. 0.

Also torch.cuda.device_count() returns 4 to me. How can I fix it?

maybe this helps:

device = torch.cuda.device(2)

as described here
can be checked using torch.cuda.current_device()

or

device = torch.device('cuda:2')

as described here

and here is a overview of cuda semantics

please keep 0-indexing in mind. meaning cuda:2 is your third cuda device, not your second

Thanks for the quick reply. But that is not possible for me as I have another script which which tells which gpus are free and returns me a string for eg. '2,3'. which convenient to be passed to the os.environ["CUDA_VISIBLE_DEVICES"] but not possible in the other styles you mentioned

Hi,

Two things:

  • The CUDA_VISIBLE_DEVICES environment variable is read by the cuda driver. So it needs to be set before the cuda driver is initialized. It is best if you make sure it is set before importing torch (or at least before you do anything cuda related in torch).
  • The device numbers within your program will always have ids starting at 0 and going up. Even if you mask to only see devices 2 and 3. From within your program they will have number 0 and 1.
6 Likes

Thanks! the point one solved my issue.
I had imported a file where the cuda device was getting initialized. I set the os.environ["CUDA_VISIBLE_DEVICES"] on the very top and it functioned as usual.

Hi Pal, I’ve been trying to get this going as well but unfortunately I am faced with limited success.
I cannot get the GPU to run nevertheless the multiple commands I’ve tried.

Can you guide me a bit, you seem to have a better understanding than I do… I am learning by trial and error and it’s tedious and painful. I’m learning but not getting to where I’d like to be.

So, I’ve tried:
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py
CUDA_VISIBLE_DEVICES=1 python train.py
CUDA_VISIBLE_DEVICES=2 python3 train.py
CUDA_VISIBLE_DEVICES=3 python3 train.py

When I run nvidia-smi all of the GPU’s are OFF and that to mean’s I’ve missed something somewhere and cannot seem to see it

Tue Jan 17 11:42:31 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-SXM2...  Off  | 00000002:01:00.0 Off |                    0 |
| N/A   27C    P0    30W / 300W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-SXM2...  Off  | 00000003:01:00.0 Off |                    0 |
| N/A   26C    P0    30W / 300W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-SXM2...  Off  | 0000000A:01:00.0 Off |                    0 |
| N/A   28C    P0    29W / 300W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-SXM2...  Off  | 0000000B:01:00.0 Off |                    0 |
| N/A   29C    P0    29W / 300W |     10MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+---------------------------------------

THE CODE is running BUT in CPU, Can you Assist Me, By Guiding My Understanding OF What’s Going On, Please?

Hi,

Your code needs to be modified to send both the model and inputs on the cuda device (when available) for the device to be used.