Cuda multiprocessing

aman_bharat · October 11, 2019, 2:17am

Hi,
I am trying to run multiprocessing in my python program. I created two processes and passed a neural network in the one process and some heavy computational function in the other. I wanted the neural net to run on GPU and the other function on CPU and thereby I defined neural net using cuda() method. But when I run the program the got the following error:
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the ‘spawn’ start method

So I tried with spawn as well as forkserver start method, but then I got the other error:
RuntimeError: cuda runtime error (71) : operation not supported at …/torch/csrc/generic/StorageSharing.cpp:245

I have tried python3 multiprocessing and torch.multiprocessing both but nothing worked for me.

albanD · October 12, 2019, 10:14pm

Hi,

Do you have a bit more context about the second error? What is the stack trace that you get?

aman_bharat · October 15, 2019, 3:42am

albanD · October 15, 2019, 3:25pm

cc @ptrblck what is the reason for this error on cuda? Do you know where this is coming from?

ptrblck · October 15, 2019, 5:14pm

Usually we see these kind of errors for IPC using CUDATensors on Windows (e.g. as explained here).
However, based on the output it looks like a Linux system is being used.

@aman_bharat could you post a reproducible code snippet, so that we can have a look?