Setting visible devices with Distributed Data Parallel

Is it possible make CUDA_VISIBLE_DEVICES and DDP work together?

I am trying to run a script on an 8 GPU server like so:

CUDA_VISIBLE_DEVICES=0,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=7 --use_env main.py

but I always run into:

RuntimeError: CUDA error: invalid device ordinal

Here is the output of nvidiia-smi:

ue Aug 18 15:21:16 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 00000000:04:00.0 Off |                  N/A |
| 20%   13C    P8     7W / 235W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  On   | 00000000:05:00.0 Off |                  N/A |
| 23%   18C    P8     8W / 235W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  On   | 00000000:08:00.0 Off |                  N/A |
| 23%   20C    P8     8W / 235W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  On   | 00000000:09:00.0 Off |                  N/A |
| 23%   23C    P8     8W / 235W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 108...  On   | 00000000:84:00.0 Off |                  N/A |
| 23%   18C    P8    11W / 235W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX 108...  On   | 00000000:85:00.0 Off |                  N/A |
| 20%   16C    P8     7W / 235W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce GTX 108...  On   | 00000000:88:00.0 Off |                  N/A |
| 20%   15C    P8     7W / 235W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  GeForce GTX 108...  On   | 00000000:89:00.0 Off |                  N/A |
| 23%   25C    P8     7W / 235W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

What am I missing?

Hey @Diego, the launching script will launch multiple sub-processes, which might be inherit the CUDA_VISIBLE_DEVICES value you passed to the command line. A work around would be setting CUDA_VISIBLE_DEVICES in main.py before loading any cuda-related packages. Note that the recommended way to use DDP is one-process-per-device, i.e., each process should exclusively run on one GPU. If you want this, you need to set CUDA_VISIBLE_DEVICES to a different value for each subprocess.

BTW, what’s the default CUDA_VISIBLE_DEVICES value in your machine? I would assume the script should be able to see all devices by default if CUDA_VISIBLE_DEVICES wasn’t set. And when the program throws RuntimeError: CUDA error: invalid device ordinal, do you know which device it tries to access?

sorry this was a mistake by me. I had set the device_ids variable in the DDP constructor besides using the CUDA_VISIBLE_DEVICES variable, once I removed the former the script runs as expected.