How can I use the Distributed instead of dataparallel

Hey,sorry for late reply.
My loss function is defined as follows:
loss = torch.norm(target_flow - input_flow, 2, 1)/batch_size
In https://discuss.pytorch.org/t/is-average-the-correct-way-for-the-gradient-in-distributeddataparallel-with-multi-nodes/34260
there are some discussions on how to calculate loss,it seems that DDP will automatically do batchsize average operation on loss,so do I need to manually average the loss?