Using DDP for training a pytorch model on distributed GPU system

We use DDP to train our model based on pytorch on a distributed GPU system. I discover that some demos for distributed GPU system based on pytorch sync initial weights for each node and compute the average grad of each node to update the patamters of model. I wonder whether we need to use these extra operation, when we use DDP to train a model based on pytorch on distributed GPU system.