I would like to launch one pytorch task per GPU from within python. Each process is independent and does not need to share any data.
I use a similar program as shown below. Using ‘nvidia-smi’, I am able to verify that I am using all GPUs when I run the code below.
import os from multiprocessing import Process import time def launch_proc(gpu_id): os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id) import torch #Allocate memory on visible GPU tensor = torch.Tensor((10**8)*[gpu_id]).cuda() time.sleep(10) #wait for 10 seconds num_gpus = 4 processes =  for gpu_id in range(num_gpus): p = Process(target=launch_proc,args=(gpu_id,)) p.start() processes.append(p) for p in processes: p.join()
Is there a cleaner implementation? Or, is the above solution the best solution? In the above solution, if I use torch.cuda.device_count() to get the number of gpus, I get the following error -
RuntimeError: CUDA error: initialization error