How to torch.cuda.set_device with torch.distributed.launch

Hi all,
I’m trying to use torch.distributed.launch with NCCL backend on two nodes each of them has single GPU. When I see here, it guides me to set torch.cuda.set_device(local_rank), however, each node has only device 0 available. So I’m confused torch.cuda.set_device(0) for both process is correct or not.
Either of them I met an error like this:

Traceback (most recent call last):
  File "", line 26, in <module>
  File "/u3/jbaik/pytorch-asr/asr/models/deepspeech_ctc/", line 56, in batch_train
    trainer = NonSplitTrainer(model, **vars(args))
  File "/u3/jbaik/pytorch-asr/asr/models/", line 93, in __init__
    self.model = nn.parallel.DistributedDataParallel(model, device_ids=[local_rank], output_device=local_rank)
  File "/home/jbaik/.pyenv/versions/3.7.0/lib/python3.7/site-packages/torch/nn/parallel/", line 134, in __init__
  File "/home/jbaik/.pyenv/versions/3.7.0/lib/python3.7/site-packages/torch/nn/parallel/", line 251, in _dist_broadcast_coalesced
    dist.broadcast(flat_tensors, 0)
  File "/home/jbaik/.pyenv/versions/3.7.0/lib/python3.7/site-packages/torch/distributed/", line 279, in broadcast
    return torch._C._dist_broadcast(tensor, src, group)
RuntimeError: NCCL error in: /u3/setup/pytorch/pytorch/torch/lib/THD/base/data_channels/DataChannelNccl.cpp:322, unhandled system error
If each node has multiple NIC, does NCCL finds the proper connection between the nodes? How about the other backends?

In my case, the same error was found when I used docker.

With ‘–network=host’ parameter, the problem was resolved.

Hi, is --network=host parameter is in nvidia-docker run command in all nodes?