Cuda visible devices mistake

I’m working in a 10-gpu cluster and I’m having some troubles with CUDA_VISIBLE_DEVICES.
Reading the forum I’ve seen it’s the recommended way of choosing arbitrary GPUs.

I’m running the following shell script:

export CUDA_VISIBLE_DEVICES=1,2,3 
export PATH=/usr/local/cuda-9.0-cudnn--v7.0/lib64/bin${PATH:+:${PATH}} 

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-9.0-cudnn--v7.0/lib64" 
export CUDA_HOME=/usr/local/cuda-9.0-cudnn--v7.0 

python

Here you can see nvidia-smi display:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:04:00.0 Off |                  N/A |
| 47%   79C    P2   190W / 250W |   6629MiB / 12196MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:05:00.0 Off |                  N/A |
| 23%   26C    P8     8W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:06:00.0 Off |                  N/A |
| 41%   62C    P2    62W / 250W |  11599MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:07:00.0 Off |                  N/A |
| 41%   66C    P2   185W / 250W |  11177MiB / 12196MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN X (Pascal)    Off  | 00000000:08:00.0 Off |                  N/A |
| 58%   86C    P2   199W / 250W |  11761MiB / 12196MiB |     78%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN X (Pascal)    Off  | 00000000:0B:00.0 Off |                  N/A |
| 51%   83C    P2   213W / 250W |  11761MiB / 12196MiB |     90%      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce GTX 108...  Off  | 00000000:0C:00.0 Off |                  N/A |
| 23%   29C    P8     8W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN X (Pascal)    Off  | 00000000:0D:00.0 Off |                  N/A |
| 54%   83C    P2   143W / 250W |  11759MiB / 12196MiB |     93%      Default |
+-------------------------------+----------------------+----------------------+
|   8  GeForce GTX 108...  Off  | 00000000:0E:00.0 Off |                  N/A |
| 23%   33C    P8    11W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   9  TITAN X (Pascal)    Off  | 00000000:0F:00.0 Off |                  N/A |
| 45%   74C    P2   103W / 250W |  11761MiB / 12196MiB |     55%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     52925      C   python                                      6619MiB |
|    2     53553      C   python3                                    11589MiB |
|    3     52925      C   python                                     11167MiB |
|    4     53160      C   python                                     11749MiB |
|    5     53161      C   python                                     11749MiB |
|    7     56769      C   python                                     11747MiB |
|    9     20983      C   python                                     11749MiB |
+-----------------------------------------------------------------------------+

Now I try to store a simple variable in gpu:

import torch
import os
os.environ.get(‘CUDA_VISIBLE_DEVICES’)
‘1’

a=torch.rand(100)
a.cuda()
tensor([ 0.0554, 0.0375, 0.4708, 0.6522, 0.4640, 0.4087, 0.4738,
0.3571, 0.4100, 0.4238, 0.6673, 0.7015, 0.8013, 0.8452,
0.6704, 0.4123, 0.1702, 0.3805, 0.1789, 0.5453, 0.6197,
0.5231, 0.7428, 0.7978, 0.3173, 0.0653, 0.4624, 0.4298,
0.2032, 0.5640, 0.1568, 0.2366, 0.0436, 0.3464, 0.8633,
0.8253, 0.7330, 0.2782, 0.6662, 0.3576, 0.1209, 0.7470,
0.4402, 0.8037, 0.2154, 0.8686, 0.3976, 0.0305, 0.9457,
0.6998, 0.5220, 0.4419, 0.9357, 0.5723, 0.4109, 0.7055,
0.3444, 0.3484, 0.7930, 0.5491, 0.1293, 0.4718, 0.9671,
0.8292, 0.0422, 0.1354, 0.3751, 0.1575, 0.8005, 0.7624,
0.7628, 0.2370, 0.8926, 0.2794, 0.5764, 0.7508, 0.5215,
0.2245, 0.8482, 0.0440, 0.2812, 0.0715, 0.1664, 0.1170,
0.9271, 0.8802, 0.2525, 0.1377, 0.5035, 0.1035, 0.5497,
0.8906, 0.1272, 0.2019, 0.3545, 0.3818, 0.8902, 0.9140,
0.5344, 0.6614], device=‘cuda:0’)

it’s stored in cuda:0 instead of cuda1

According to nvidia-smi it’s

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     52646      C   python                                       509MiB |
|    0     52925      C   python                                      6619MiB |
|    2     53553      C   python3                                    11751MiB |
|    3     52925      C   python                                     11167MiB |
|    4     53160      C   python                                     11749MiB |
|    5     53161      C   python                                     11749MiB |
|    7     56769      C   python                                     11747MiB |
|    9     20983      C   python                                     11749MiB |
+-----------------------------------------------------------------------------+

Trying with GPUs 1,2 from nvidia smi

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7467 C python 509MiB |
| 0 52925 C python 6619MiB |
| 1 7467 C python 509MiB |
| 2 7388 C python3 11751MiB |
| 3 52925 C python 11167MiB |
| 4 53160 C python 11749MiB |
| 5 53161 C python 11749MiB |
| 7 56769 C python 11747MiB |
| 9 20983 C python 11749MiB |
±----------------------------------------------------------------------------+

It saves variables in gpu0 and gpu 1.

Why?

Another simple question. devicce_ids from pytorch are relative to the available GPUs? I mean, if u set CVD=4,5,8 GPU 4 from CVD would be cuda:0 for pytorch and so on, right?

Try to set export CUDA_DEVICE_ORDER=PCI_BUS_ID in your terminal before executing your script.

Yes, the device ids will be mapped to the 'cuda:id' starting at zero.

1 Like