Nvlink error : Undefined reference to 'XXX' when build pytorch from source

I’ m trying to install pytorch from source folloing the official guidance,when I run python setup.py develop. It reports many nvlink errors like below:

nvlink error   : Undefined reference to '_Z28ncclAllReduceCollNet_min_f64P14CollectiveArgs' in 'dir_to_pytorch/build/nccl/obj/collectives/device/functions.o'

the error output:

FAILED: nccl_external-prefix/src/nccl_external-stamp/nccl_external-build nccl/lib/libnccl_static.a 
cd /home/wtx/workspace/cpp_project/pytorch/third_party/nccl/nccl && env make CXX=/usr/bin/c++ CUDA_HOME=/home/wtx/.local/cuda-11.8 NVCC=/home/wtx/.local/cuda-11.8/bin/nvcc NVCC_GENCODE=-gencode=arch=compute_86,code=sm_86 BUILDDIR=/home/wtx/workspace/cpp_project/pytorch/build/nccl VERBOSE=0 -j && /usr/bin/cmake -E touch /home/wtx/workspace/cpp_project/pytorch/build/nccl_external-prefix/src/nccl_external-stamp/nccl_external-build
make -C src build BUILDDIR=/home/wtx/workspace/cpp_project/pytorch/build/nccl
make[1]: Entering directory '/home/wtx/workspace/cpp_project/pytorch/third_party/nccl/nccl/src'
make[2]: Entering directory '/home/wtx/workspace/cpp_project/pytorch/third_party/nccl/nccl/src/collectives/device'

Pytorch version: 1.9.0
Python version: 3.10
Can somebody tell me how to solve the problem?

Which NCCL version are you using when building this older PyTorch version from source?

nccl version is 2.7.8

I don’t know why the error is raised, but seems to be related to CollNet. You could try to set NCCL_COLLNET_ENABLE=0 and check if it would work or alternatively you might need to install the network plugins.