Bottleneck DataParallel

Hey !

I have 4 gpu in my machine. I trained a model with 1 gpu then with 4 gpu.
But the training in 4 gpu is too long.

I guess its because I have to load my model on each gpu (that is the things that make this so slow).

Someone have a solution because loading my models in all the gpu seems to slow.


Hi @MehdiZouitine,

DataParallel has a speed issue due to python GIL and its parallelization. As you mentioned, DataParallel has to recast the parameters from gpu 0 to gpu 1~3 at every forward computation.

You can try DistributedDataParallel which is faster. It is the officially recommended multi-gpu scheme.
Here’s the tutorial for DDP.

Also, you can refer to a complete training/evaluation example.

1 Like