PyTorch ddp with fork

The multiprocessing best practices in the documentations states:

“The CUDA runtime does not support the fork start method; either the spawn or forkserver start method are required to use CUDA in subprocesses”

Does this mean that I can’t write a ddp training script that works on gpus with ‘fork’?

I haven’t found a clear answer for this and I’m not sure what CUDA runtime means in the docs. In my specific use case, I kinda have to use ‘fork’ so I can pass object like data with shared memory.

If so, what are the limitations of using mp.Process with fork method?

Hey @amirhf,

It’s OK to fork a process as long as the parent process has not yet created a CUDA runtime/context. The CUDA context will be created lazily when the process creates CUDA tensors or run CUDA operations.

In my specific use case, I kinda have to use ‘fork’ so I can pass object like data with shared memory.

I would expect PyTorch Tensors shared_memory also works in spawn mode. Did you hit any error when doing that?