DDP training on RTX 4090 (ADA, cu118)

After further investigation the problem was due to NCCL backend trying to use peer to peer (P2P) transport.
Forcing NCCL_P2P_DISABLE=1 fixed the issue :+1:

4 Likes