How can I change nccl version in pytorch?

hi I’m using cuda 11.3 and if I run multi-gpus it freezes so I thought it would be solved if I change pytorch.cuda.nccl.version…

also is there any way to find nccl 2.10.3 in my env? because apt search nccl didn’t show any 2.10.3 version that shows in torch.cuda.nccl.version. I wonder if I remove 2.10.3, then torch would set the default version as 2.9.9.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
Python 3.8.8 (default, Apr 13 2021, 19:58:26) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.10.2+cu113'
>>> torch.cuda.nccl.version()
(2, 10, 3)
libhttpasyncclient-java/focal 4.1.4-1 all
  HTTP/1.1 compliant asynchronous HTTP agent implementation

libnccl-dev/unknown 2.11.4-1+cuda11.6 amd64 [upgradable from: 2.9.9-1+cuda11.3]
  NVIDIA Collective Communication Library (NCCL) Development Files

libnccl2/unknown 2.11.4-1+cuda11.6 amd64 [upgradable from: 2.9.9-1+cuda11.3]
  NVIDIA Collective Communication Library (NCCL) Runtime

libpuppetlabs-http-client-clojure/focal 0.9.0-1 all
  Clojure wrapper around libhttpasyncclient-java

libvncclient1/focal-updates,focal-security 0.9.12+dfsg-9ubuntu0.3 amd64
  API to write one's own VNC server - client library

python-ncclient-doc/focal 0.6.0-2.1 all
  Documentation for python-ncclient (Python library for NETCONF clients)

python3-ncclient/focal 0.6.0-2.1 all
  Python library for NETCONF clients (Python 3)

thanks

The binaries ship with their own CUDA runtime, cuDNN, NCCL, etc. libs, so you won’t be able to change them directly.
You could build PyTorch from source and use your locally installed NCCL via e.g.:

NCCL_INCLUDE_DIR="/usr/include/" NCCL_LIB_DIR="/usr/lib/" USE_SYSTEM_NCCL=1 python setup.py install

I deleted my torch downloaded from pip and tried to download from source

conda install astunparse numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses
conda install -c pytorch magma-cuda110 
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive --jobs 0
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

NCCL_INCLUDE_DIR="/usr/local/cuda-11.3/targets/x86_64-linux/include“ NCCL_LIB_DIR="/usr/local/cuda-11.3/targets/x86_64-linux/“lib USE_SYSTEM_NCCL=1 python setup.py install

but it crashes with error

--   Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;fp16;/root/anaconda3/lib/libmpicxx.so;/root/anaconda3/lib/libmpi.so;gloo;tensorpipe;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
--   USE_COREML_DELEGATE     : OFF
-- Configuring incomplete, errors occurred!

Could you check in the build logs what exactly failed?

it was nccl p2p not the nccl version in pytorch. thanks!