RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I’m getting the above error even though I’m not using multiprocessing.

Hi,

Are you using a dataloader with multiple workers? If so, you might want to have your dataloader work with cpu tensors (and pinned memory) and send the tensors to the gpu in your training loop. Or not use multiple workers to avoid any issue with multiprocessing,

3 Likes

Yes, I was using multiple workers. Not using them in my Dataloader fixed the issue. Thanks!

For me, using multiple workers is necessary, and it used to work (in July), but after an update to my PyTorch distro I started getting this error.

Setting num_workers=1 didn’t fix this issue.

Opened new thread here.

1 Like

May be I am missing something but can’t I use dataloader and tensors on GPU ?

It is tricky because CUDA does not allow you to easily share data across processes. And so the transfert from the process that loads the sample to the main one won’t be optimal.
You want to get a Tensor from pinned memory and send it to the GPU in the main process to avoid such issues.

1 Like

I shared my code here , I followed the workaround and added torch.multiprocessing.set_start_method('spawn') before my DummyDataSet class but now its throwing error

  File "/usr/local/lib/python3.6/multiprocessing/context.py", line 242, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set
.....
RuntimeError: DataLoader worker (pid 1095) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

Do you really need to have the Tensors on the GPU when you load them? Sending them back in pinned memory does not work for you?

After using multiple workers I noticed that the only time consumed about data is transferring data from CPU to GPU (in the training loop), so I wonder if it is possible to move the transferring step into the data set class to save training time. Then I got the same error as the author got. So I guess we can not do this, right?