multi-GPUs perfom slower than single one, why?

I am trying to reimplement the model of LAB, however, when I train this model on the sever(4 GTX 1080 Ti), it does even slower than my host computer(RTX 2080).

Maybe data loading is becoming a bottleneck.
Refer to this tutorial. It may help dubug your code if it has some issues.