Launch independent pytorch processes on each GPU from python

Problem:
I would like to launch one pytorch task per GPU from within python. Each process is independent and does not need to share any data.

Solution:
I use a similar program as shown below. Using ‘nvidia-smi’, I am able to verify that I am using all GPUs when I run the code below.

import os
from multiprocessing import Process
import time

def launch_proc(gpu_id):
    os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
    import torch
    #Allocate memory on visible GPU
    tensor = torch.Tensor((10**8)*[gpu_id]).cuda()
    time.sleep(10) #wait for 10 seconds 

num_gpus = 4
processes = []
for gpu_id in range(num_gpus):
    p = Process(target=launch_proc,args=(gpu_id,))
    p.start()
    processes.append(p)

for p in processes:
    p.join()

Question:
Is there a cleaner implementation? Or, is the above solution the best solution? In the above solution, if I use torch.cuda.device_count() to get the number of gpus, I get the following error -
RuntimeError: CUDA error: initialization error

hey, I’m having a similar use-case.
what did you do eventually?
Thanks :slight_smile:

I used something similar to the following code -

import os
import multiprocessing as mp
import time
import torch

def launch_proc(gpu_id):
    os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
    import torch
    #Allocate memory on visible GPU
    tensor = torch.Tensor((10**8)*[gpu_id]).cuda()
    time.sleep(10) #wait for 10 seconds 

if __name__ == '__main__':
    num_gpus = torch.cuda.device_count()
    ctx = mp.get_context('spawn')
    with ctx.Pool(processes=num_gpus) as pool:
        pool.starmap(launch_proc, [(i,) for i in range(num_gpus)])

The main change was to use the spawn context and enclose the GPU launch processes within if __name__ == '__main__'.