RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Poorva_Rane · February 27, 2018, 7:21am

I’m getting the above error even though I’m not using multiprocessing.

albanD · February 27, 2018, 9:46am

Hi,

Are you using a dataloader with multiple workers? If so, you might want to have your dataloader work with cpu tensors (and pinned memory) and send the tensors to the gpu in your training loop. Or not use multiple workers to avoid any issue with multiprocessing,

Poorva_Rane · February 28, 2018, 1:37pm

Yes, I was using multiple workers. Not using them in my Dataloader fixed the issue. Thanks!

mcskwayrd · August 28, 2019, 9:33pm

For me, using multiple workers is necessary, and it used to work (in July), but after an update to my PyTorch distro I started getting this error.

Setting num_workers=1 didn’t fix this issue.

Opened new thread here.

Gautam_Kumar · June 22, 2020, 10:12pm

May be I am missing something but can’t I use dataloader and tensors on GPU ?

albanD · June 22, 2020, 10:15pm

It is tricky because CUDA does not allow you to easily share data across processes. And so the transfert from the process that loads the sample to the main one won’t be optimal.
You want to get a Tensor from pinned memory and send it to the GPU in the main process to avoid such issues.

Gautam_Kumar · June 22, 2020, 10:42pm

I shared my code here , I followed the workaround and added torch.multiprocessing.set_start_method('spawn') before my DummyDataSet class but now its throwing error

  File "/usr/local/lib/python3.6/multiprocessing/context.py", line 242, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set
.....
RuntimeError: DataLoader worker (pid 1095) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

albanD · June 23, 2020, 2:58pm

Do you really need to have the Tensors on the GPU when you load them? Sending them back in pinned memory does not work for you?

haowxu · February 11, 2022, 9:24am

After using multiple workers I noticed that the only time consumed about data is transferring data from CPU to GPU (in the training loop), so I wonder if it is possible to move the transferring step into the data set class to save training time. Then I got the same error as the author got. So I guess we can not do this, right?