Training on distributed GPU system is out of sync

We use a distributed GPU system with four v100 GPUs for each node to train our model through pytorch. We follow the demo of pytorch used on distributed GPU system. As we all know, each node has a copy of model. However, there is a great time interval for copy on different node. I wonder wether anyone else has this problem. Please give me some advice for this issue.