Training MobileNet on multi GPUs is slow using Pytorch

yw1994 · November 21, 2018, 1:17am

I was trying to train MobileNet on multi gpus using Pytorch. From watch nvidia-smi , I see that GPUs are sometimes working, and sometimes not working(GPU util is 0%). This slows down training speed a lot.

But training MobileNet on a single GPU and training ResNet50 on multi GPUs do not have such issue. I was wondering what is going wrong. Is there someone used to meet this problem?

PS:

Pytorch version is 0.4.0
I read all training data into memory.
I have also tried Keras, it does not have such issue.

ptrblck · November 21, 2018, 1:45pm

It might be you are seeing an overhead scattering and gathering the whole model.
You could just parallelize the feature layers, if that’s possible.
Have a look at the ImageNet example where this is also performed.
Krizhevsky published this approach in his One weird trick paper.