I’m trying to specify specify which single GPU to run code on within Python code, by setting the GPU index visible to PyTorch. Here’s what I’ve tried:
for i in range(8): #8 gpus
os.environ["CUDA_AVAILABLE_DEVICES"] = str(i)
print(torch.cuda.device_count())
# this line always outputs 8 (all 8 devices) instead of 1...
...
I’m using PyTorch 1.0.0. How do I specify which GPU machine (by index or otherwise) to run code on (without using .to(device)) within Python code?
Do you mean that once it’s set, it cannot be changed? For my case, I’m hoping to make GPU 0 visible on the 1st iteration, GPU 1 visible on the 2nd, etc till GPU 7 and iter 8. Is there a way to do this from Python? Thanks a lot!
Do you mean that once it’s set, it cannot be changed?
I believe so.
For my case, I’m hoping to make GPU 0 visible on the 1st iteration, GPU 1 visible on the 2nd, etc till GPU 7 and iter 8. Is there a way to do this from Python?
Can this be done by explicitly passing torch.cuda.device(i) to tensors/modules or use torch.cuda.set_device(i)? Is there any reason that you would like to change the visible devices?
I’m trying to do some parallelism but can’t figure out how to initialise processes with different ranks, each process on 1 different GPU. I am modifying some distributed computing code, but instead of having numerous nodes, I only have 1 8 GPU machine to work with.
(Pls bear with me, I’m a beginner in distributed computing!) So far I’ve worked out that the line dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, world_size=1, rank=args.rank) intialises the same process on all 8 GPUs, but this causes problems later in my code, where I need to get the specific GPU machine index. I tried to do this with torch.cuda.current_device() but it also returns 0 despite nvidia-smi showing that all 8 GPUs have been used.
The init_process_group API only sets up the process where this function is invoked. And, as the world_size is set to 1, It only expects one process in the distributed training gang. If you would like to use multiple processes, please see this example.
but this causes problems later in my code, where I need to get the specific GPU machine index.
To get machine index, will it work if you use args.rank?
I tried to do this with torch.cuda.current_device() but it also returns 0 despite nvidia-smi showing that all 8 GPUs have been used.
The torch.cuda.current_device() returns the current device. By default, it is the first GPU, which is indeed indexed by 0.