How cuda is initialized in DDP

As I know, when we start a process, we can prevent init cuda many times with following code in
pytorch/Context.h at master · pytorch/pytorch · GitHub

THCState* lazyInitCUDA() {
std::call_once(thc_init,[&] {
thc_state = detail::getCUDAHooks().initCUDA();
return thc_state.get();

By the time, we use ddp, we set start_method like (spawn, fork, forkserver), there will be multi-process, how we can prevent the CUDA to be initialized many times???

And I want to know how cuda is initialized in DDP???

Thanks a lot!

The CUDA context is initialized in DDP in the same way as in a single process Python script, i.e. in its first CUDA operation. You could set the device via torch.cuda.set_device, mask the other devices etc. to make sure the default device (or any other unwanted devices) are used in the current Python process.

1 Like