PyTorch Forums
Is average the correct way for the gradient in DistributedDataParallel with multi nodes?
distributed
Lausanne
February 10, 2019, 2:56am
16
@coincheung
Your lr in torch.distributed mode should be 0.005
Training performance degrades with DistributedDataParallel
show post in topic