Why I use Multi-GPU(four) to train on imagenet12 slow than one GPU in my old machine?

my code is mobilenet code on github I found:https://github.com/marvis/pytorch-mobilenet/blob/master/main.py

I have met the same problem when I use multi-gpus to inference and train with torch.nn.DataParallel.
You can use this code to reproduce the problem.

Finally I solve the problem when inference by multiprocessing module, but i don’t know how to change my code in train mode. If someone know something about this, If anyone knows about it , please tell me