Runtime error while multiprocessing


I’m having trouble with multiple processes working on the same GPU. I wrote minimal error-reproducing example.
I ran the example code successfully on my local machine, using CUDA 10.2 and pytorch 1.2.0.
While this works just fine, it fails to run on a cluster with CUDA 10.1 and pytorch 1.2.0.

Does anybody know why or how to overcome this? Thanks a ton.


import torch.multiprocessing as _mp
import torch
import os
import time
import numpy as np

mp = _mp.get_context('spawn')

class Process(mp.Process):
    def __init__(self, id):
        print("Init Process") = id

    def run(self):
        os.environ['CUDA_VISIBLE_DEVICES'] = '0'
        for i in range(3):
            with torch.cuda.device(0):
                x = torch.Tensor(10).to(0)
                del x

if __name__ == "__main__":
    num_processes = 2
    os.environ['CUDA_VISIBLE_DEVICES'] = '0'
    processes = [Process(i) for i in range(num_processes)]
    [p.start() for p in processes]
    [p.join() for p in processes]


Process Process-2:
Traceback (most recent call last):
  File "/cluster/home/marksm/software/anaconda/envs/test/lib/python3.6/multiprocessing/", line 258, in _bootstrap
  File "/cluster/home/marksm/", line 20, in run
    x = torch.Tensor(10).to(0)
RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable
1 Like