PyTorch Forums
Nccl error in torch._C._dist_broadcast(tensor, src, group) when train in two nodes
acgtyrant
(acgtyrant)
November 7, 2018, 5:12am
3
Use NCCL_SOCKET_IFNAME to specify the ip interface.
1 Like
Encounter Error while running distributed training on fairseq
show post in topic