Multiprocessing CUDA tensors

I want to train several replicas of the same structure of neural network on one GPU simultaneously. Theses replicas don’t share variables during each own’s training process. How to start the replicas properly? The docs says multiprocessing doesn’t support CUDA tensors, however there is no proper documents about spawn or forkserver.


If you just want to run multiple independent jobs on a single GPU, there will be absolutely no problem.
When talking about multiprocessing, it means that you want multiple processes to work together and exchange tensors at runtime, this does not seem to be your usecase.

I want to train several replicas of the same network. After each replica has completed, the weights will be exchanged. Although tensors are exchanged during runtime, after each replica there will be tensor exchanges. Can multiprocessing handle this?

In that case, as stated in the documentation, it will work if you use spawn or forkserver to start the process. You can refer to the python documentation here for more details about that.