Mulit GPU using nn.DataParallel don't work

HT_Liu · January 7, 2018, 12:35pm

Hi, I use nn.DataParallel to use multi GPU; However it seems that multi-gpu don’t speed up in any aspect.
the details in one epoch are as belowed:
Two GPUs:

Batch_size         Loss   Time(s)
64                 256     320
128                140      324
256                78       330

And Using One GPU:

Batch_size        Loss   Time(s)
64                 250     290
128                137     298
256                77      260

The result really confuses me a lot. so what and where the Multi GPU works for?
In one epoch , the time is almost the same.
PS:
I use the Multi GPU just with nn.Dataparallel

model = model.cuda()
model = nn.Dataparallel(model)

Update:

Multi GPU is for larger batch_size , is it right? larger batch, faster convergence?

melgor · January 8, 2018, 12:17pm

First of all, check GPU utilization. can be by watch -n 1 nvidia-smi
The utilization should be ~90% on both GPU. If not:

The feeding of data is too slow (need more workers or SSD drive)
Your network is not too complicated to use all resources

I advice you to check training using ex. Resnet101 with random data and see if there is any speed up per epoch.

HT_Liu · January 8, 2018, 12:33pm

Thanks for your reply.
Both of the GPU utilization are about 60~70%.
What confuses me is that the time of one epoch with two GPUs is almost the same as with one GPU although the two GPUs are working.