What does “export CUDA_VISIBLE_DEVICES=1” really do?

111357 · July 24, 2020, 12:49am

In a multi-GPU computer(Ubuntu 16),
I want to use GPU1 and do the following setting in the shell:

$ export CUDA_VISIBLE_DEVICES=1
$ python

But, I got an error when I set the device of the tensor:

Python 3.8.3 (default, May 19 2020, 18:47:26) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> a = torch.Tensor(1).to("cuda:0")
>>> a = torch.Tensor(1).to("cuda:1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: invalid device ordinal

Why I can’t specify the device of Tensor to GPU1 by to.

111357 · July 24, 2020, 12:50am

@ptrblck, Could you please give me some advice?

ebarsoum · July 24, 2020, 1:29am

Setting CUDA_VISIBLE_DEVICES=1 mean your script will only see one GPU which is GPU1. However, inside your script it will be cuda:0 and not cuda:1. Because it only see one GPU and its index start at 0.

For example if you do: CUDA_VISIBLE_DEVICES=2,4,5, your script will see 3 GPUs with index 0, 1 and 2.

111357 · July 24, 2020, 1:50am

Got it, thanks a lot!