I use torch.distributed to train my model.
When I use torch.multiprocessing.set_start_method('spawn'), the gpu usage memory will be increased with the increasing num_workers.
However, when I don’t use torch.multiprocessing.set_start_method('spawn'), the gpu usage memory is consistent with different num_workers.
Therefore, should I use spawn to start multi-processing ?
What’s the influence of the set_start_method('spawn') ?
Why the increasing num_workers increases the gpu usage memory when spawn mode?
When using GPU, I believe spawn should be used, as according to this multiprocessing best practices page, CUDA context (~500MB) does not fork. This could also be the reason why you see increasing GPU memory footprint when using more spawned processes, as each process will have its dedicated CUDA context.
Curious, can you allocate a different GPU to each different process? Or do they have to use the same GPU in your application?
I want to bump this post, I’m having this exact problem right now. Each additional worker my processes are spawning for data loading is resulting in an increase of about 500 MiB per worker.
Specifically, I’m trying to run my code with 2 GPUs, thus I spawn two processes. Each initialize their own DataLoader object with num_workers=2. I find that there’s 6 processes in total using GPU memory, I definitely expect the two main processes to utilize GPU memory but I don’t understand why the dataloader worker processes are also utilizing GPU memory.
If I run my code without DDP, I do not see this issue.