Multiprocessing CUDA tensors

liangstein · March 20, 2018, 1:27pm

I want to train several replicas of the same structure of neural network on one GPU simultaneously. Theses replicas don’t share variables during each own’s training process. How to start the replicas properly? The docs says multiprocessing doesn’t support CUDA tensors, however there is no proper documents about spawn or forkserver.

albanD · March 20, 2018, 2:03pm

Hi,

If you just want to run multiple independent jobs on a single GPU, there will be absolutely no problem.
When talking about multiprocessing, it means that you want multiple processes to work together and exchange tensors at runtime, this does not seem to be your usecase.

liangstein · March 20, 2018, 2:09pm

Hi,
I want to train several replicas of the same network. After each replica has completed, the weights will be exchanged. Although tensors are exchanged during runtime, after each replica there will be tensor exchanges. Can multiprocessing handle this?

albanD · March 20, 2018, 2:14pm

In that case, as stated in the documentation, it will work if you use spawn or forkserver to start the process. You can refer to the python documentation here for more details about that.