Multi-GPU get a better performance?

Hi,

I am running the same training code with and without the line:
model = nn.DataParallel(model, device_id=[0,1])

I am surprised that the training with multi-gpu get a much better result.
I thought that was something random in training procedure. So I repeated several times the singe/multi-GPU training. The multi-gpu training always get a result 2% better than the single-gpu training. 2% is not a negligible difference in my task.

I am using Resnet50 with my own data. I replaced the last fc layer. And dropout is used.

So I want to know how to explain this? whether there are some operations working differently in multi-gpu mode?

Thank you

the difference is the batch_size, when you uyse nn.DataParralle(), the batch size is multiple times as your single gpu’s one

So acturally there’s no difference on the performance, right?