I use PyTorch in Python2 to do some deep learning tasks.
In my Dataset class, I use cuda.jit of Numba to preprocess some 3D point cloud data. When I set num_workers=0 in Dataloader, everything is OK, but the speed is slow.
When I set num_workers > 0, there will be CudaSupportError. Because when forking process, if CUDA has been used in main process, the CUDA will be initialized in sub process again and this will trigger the error.
According to https://pytorch.org/docs/stable/notes/multiprocessing.html#multiprocessing-cuda-note, the best solution is to set start method to “spawn”. However, my whole project is written in Python2, which does not support mulitprocessing.set_start_method(‘spawn’).
Then I tried to initialize cuda-related Tensors after the dataloader creates the subprocesses, i.e., in the first batch. It worked for the first epoch, but the same error occurs in the second epoch. Then I found that the dataloader creates new subprocesses for every epoch, and destroys them when the loop exits.
I wonder that if I can use the same subprocesses in the dataloader for every epoch. Then I can initialize cuda when those subprocesses are created in the beginning, and avoid the CUDA re-initialization errors.