Problem:
I would like to launch one pytorch task per GPU from within python. Each process is independent and does not need to share any data.
Solution:
I use a similar program as shown below. Using ‘nvidia-smi’, I am able to verify that I am using all GPUs when I run the code below.
import os
from multiprocessing import Process
import time
def launch_proc(gpu_id):
os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
import torch
#Allocate memory on visible GPU
tensor = torch.Tensor((10**8)*[gpu_id]).cuda()
time.sleep(10) #wait for 10 seconds
num_gpus = 4
processes = []
for gpu_id in range(num_gpus):
p = Process(target=launch_proc,args=(gpu_id,))
p.start()
processes.append(p)
for p in processes:
p.join()
Question:
Is there a cleaner implementation? Or, is the above solution the best solution? In the above solution, if I use torch.cuda.device_count() to get the number of gpus, I get the following error -
RuntimeError: CUDA error: initialization error