Running on specific GPU device

I’m trying to specify specify which single GPU to run code on within Python code, by setting the GPU index visible to PyTorch. Here’s what I’ve tried:

for i in range(8): #8 gpus
    os.environ["CUDA_AVAILABLE_DEVICES"] = str(i)
    print(torch.cuda.device_count())
    # this line always outputs 8 (all 8 devices) instead of 1...
    ...

I’m using PyTorch 1.0.0. How do I specify which GPU machine (by index or otherwise) to run code on (without using .to(device)) within Python code?

Hey @CCL, you will need to set the CUDA_AVAILABLE_DEVICES env var before launching the process. Sth like:

$ CUDA_AVAILABLE_DEVICES=0 python main.py

If you just want to set the default device, you can use set_device

Update

Please ignore the code above. I miss-read the variable name. It should be CUDA_VISIBLE_DEVICES

2 Likes

Hi thanks for the reply, is it possible to set it with python instead of in shell?

Hi thanks for the reply, is it possible to set it with python instead of in shell?

Yes, but you need to make sure it is set before initializing the CUDA context. See the code below:

import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" 
torch.cuda.device_count()  # print 1
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"  
torch.cuda.device_count()  # still print 1
1 Like

Do you mean that once it’s set, it cannot be changed? For my case, I’m hoping to make GPU 0 visible on the 1st iteration, GPU 1 visible on the 2nd, etc till GPU 7 and iter 8. Is there a way to do this from Python? Thanks a lot!

Do you mean that once it’s set, it cannot be changed?

I believe so.

For my case, I’m hoping to make GPU 0 visible on the 1st iteration, GPU 1 visible on the 2nd, etc till GPU 7 and iter 8. Is there a way to do this from Python?

Can this be done by explicitly passing torch.cuda.device(i) to tensors/modules or use torch.cuda.set_device(i)? Is there any reason that you would like to change the visible devices?

1 Like

I’m trying to do some parallelism but can’t figure out how to initialise processes with different ranks, each process on 1 different GPU. I am modifying some distributed computing code, but instead of having numerous nodes, I only have 1 8 GPU machine to work with.

(Pls bear with me, I’m a beginner in distributed computing!) So far I’ve worked out that the line dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, world_size=1, rank=args.rank) intialises the same process on all 8 GPUs, but this causes problems later in my code, where I need to get the specific GPU machine index. I tried to do this with torch.cuda.current_device() but it also returns 0 despite nvidia-smi showing that all 8 GPUs have been used.

The init_process_group API only sets up the process where this function is invoked. And, as the world_size is set to 1, It only expects one process in the distributed training gang. If you would like to use multiple processes, please see this example.

but this causes problems later in my code, where I need to get the specific GPU machine index.

To get machine index, will it work if you use args.rank?

I tried to do this with torch.cuda.current_device() but it also returns 0 despite nvidia-smi showing that all 8 GPUs have been used.

The torch.cuda.current_device() returns the current device. By default, it is the first GPU, which is indeed indexed by 0.

1 Like

I just tried using args.rank, but it seems like they all return rank 0. I’m really quite lost on how to do parallelism.

How did you launch those 8 processes? Did you launch it using similar code in the example or the launching script?

And how did you set args.rank? I presume it’s command line args + argparse?

It will be helpful to have a self-contained min repro code. So that we can help debug.

To use a different gpu in the system, isn’t when you declare the device
mydevice=torch.device(“cuda:2”)
or
mydevice=torch.device(“cuda”, 2)

the point is you have to pass the ordinal for the gpu you want to use.
See torch.device at Tensor Attributes — PyTorch 1.8.1 documentation