How to Change NCCL in PyTorch

I ran into an issue when using 16 K80 GPUs today. It turns out that the default behavior (peer-to-peer) in PyTorch will not support more than 8 devices, probably for very good reasons that I don’t understand. A proposed solution is to set NCCL_P2P_LEVEL=1 for the environment, but I’m not sure how to actually do that because I have never had to fiddle with NVIDA environment. Is there a PyTorch command that will let me set NCCL_P2P_LEVEL? Where do I change this? I’m building models in a Jupyter Notebook on a Windows machine.

You could set the env variable directly via NCCL_P2P_LEVEL=1 python args.
However, NCCL shouldn’t be supported on Windows, so you might need to use another backend.

Thanks @ptrblck! Is that the only way of training a model on more than 8 GPUs with PyTorch? Note that I have the model layers distributed across devices in a model parallel manner.

It might depend on your system and you can check the GPU connections via nvidia-smi topo -m.
I’m able to train a ResNet on 16GPUs without any NCCL env variables.