Hi,
I am running the same training code with and without the line:
model = nn.DataParallel(model, device_id=[0,1])
I am surprised that the training with multi-gpu get a much better result.
I thought that was something random in training procedure. So I repeated several times the singe/multi-GPU training. The multi-gpu training always get a result 2% better than the single-gpu training. 2% is not a negligible difference in my task.
I am using Resnet50 with my own data. I replaced the last fc layer. And dropout is used.
So I want to know how to explain this? whether there are some operations working differently in multi-gpu mode?
Thank you