Using system NCCL when building PyTorch from source

I’m trying to build PyTorch from source on an Ubuntu machine in a miniconda environment.

I’m trying to use the existing NCCL libraries installed on the system. I set the USE_SYSTEM_NCCL=1 but it still compiles and links with the NCCL submodule in the PyTorch repository.

I took a look at this issue - Use system NCCL library in PyTorch · Issue #32286 · pytorch/pytorch · GitHub which states that this is expected behavior. Is there any way to configure the build to use the system NCCL despite being built in a conda env? Or should I just build manually without conda? I don’t want to mess up any dependencies other users may have since this is a machine accessed by several users.

The env variable worked in the past properly and is now not needed anymore as we removed the third_party/nccl folder and now depend on the system NCCL installation. You could thus simply update your repository and rebuild as it’s the default behavior now.

Thanks for the reply! I checked out the v2.7.1 commit of PyTorch (e2d141db) before I ran the build and yet during my build. it pulled nccl into the third party folder. Do you think I’m doing something wrong? Please let me know if you need more specifics on my environment.

I now see that there is a function called git_checkout_nccl() which is called unconditionally in build_pytorch_libs in commit e2d141db that I have checked out. However, the later commits have added a check for the USE_SYSTEM_NCCL and USE_NCCL flags before calling it.

Additionally setting USE_SYSTEM_NCCL and USE_NCCL in my shell environment instead of in setup.py seems to correctly initialize the variables in my build.

This seems to have fixed my issue :slight_smile:

1 Like

Great! Thanks for confirming these env variables work for you, too!