RuntimeError: Distributed package doesn't have NCCL built in

I am trying to finetune a ProtGPT-2 model using the following libraries and packages:
image

I am running my scripts in a cluster with SLURM as workload manager and Lmod as environment modul systerm, I also have created a conda environment, installed all the dependencies that I need from Transformers HuggingFace. The cluster also has multiple GPUs and CUDA v 11.7.
However, when I run my script to train the model I got the following error:

 File "protGPT_trainer.py", line 475, in <module>
    main()
  File "protGPT_trainer.py", line 438, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
    return inner_training_loop(
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/transformers/trainer.py", line 1702, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 257, in __init__
    dist.init_distributed(dist_backend=self.dist_backend,
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 656, in init_distributed
    cdb = TorchBackend(dist_backend, timeout, init_method, rank, world_size)
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 36, in __init__
    self.init_process_group(backend, timeout, init_method, rank, world_size)
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 40, in init_process_group
    torch.distributed.init_process_group(backend,
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 602, in init_process_group
    default_pg = _new_process_group_helper(
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 727, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in

My script to run the training is this:

I also have been checking some related cases in GitHub, Stack, and PyTorch forums, but most of them don’t have a clear answer.
I’d like to know if there is a solution for this error, and how can I face it?

I am new in this topic, so I will answer additional questions to clarify the case.

Could you post the output of python -m torch.utils.collect_env, please, as it seems you might have installed a PyTorch binary without NCCL support (so maybe the CPU-only binary).

2 Likes

Hello! Thank you for answering. Probably I have installed something wrong, I am learning how to work with PyTorch.
Here is the output:

Collecting environment information…
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: CentOS Stream 8 (x86_64)
GCC version: (GCC) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8.0 (default, Nov 6 2019, 21:49:08) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.18.0-240.22.1.el8_3.x86_64-x86_64-with-glibc2.10
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.12.1
[conda] _tflow_select 2.3.0 mkl
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] numpy 1.23.5 py38h14f4228_0
[conda] numpy-base 1.23.5 py38h31eccc5_0
[conda] pytorch 1.12.1 cpu_py38hb1f1ab4_1
[conda] tensorflow 2.4.1 mkl_py38hb2083e0_0
[conda] tensorflow-base 2.4.1 mkl_py38h43e0292_0

Will be great if you can give me some advices how to deal with this.

Yes, you have installed the CPU-only conda binary:

pytorch 1.12.1 cpu_py38hb1f1ab4_1

as already speculated.
You can select the desired CUDA version from the install matrix here, copy/paste the command, and it should install the right binaries.

5 Likes

Thank you so much! It solved my error!

Hi. I have this problem. Here is my output from your given CLI @ptrblck

❯ arch=arm64 python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.0.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.2 (x86_64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: version 3.25.0
Libc version: N/A

Python version: 3.10.11 (main, Apr 20 2023, 16:12:28) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime)
Python platform: macOS-13.2-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2 Max

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.0.0
[pip3] torchaudio==2.0.1
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.15.1
[conda] Could not collect

How to solve this problem?

Based on the posted environment it seems you are using an Apple M2 Max, which neither supports NCCL nor CUDA and also doesn’t use multiple GPUs as it’s a Laptop, so I’m unsure what your exact use case is and why you want to use NCCL.
Could you explain your use case a bit more, please?

@ptrblck

Hey, I am having the same issue, please help me :pray::
This is my output from “python -m torch.utils.collect_env”

PyTorch version: 1.8.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA RTX A2000
GPU 1: NVIDIA GeForce RTX 3060 Ti

Nvidia driver version: 531.41
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.8.0+cu111
[pip3] torchvision==0.9.0+cu111
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 haa95532_640
[conda] mkl-service 2.4.0 py38h2bbff1b_0
[conda] mkl_fft 1.3.1 py38h277e83a_0
[conda] mkl_random 1.2.2 py38hf11a4ad_0
[conda] numpy 1.23.5 py38h3b20f71_0
[conda] numpy-base 1.23.5 py38h4da318b_0
[conda] torch 1.8.0+cu111 pypi_0 pypi
[conda] torchvision 0.9.0+cu111 pypi_0 pypi

I would also say I dont want to update the CUDA nor the pytorch versions.

THANK YOU!!!

What does:

torch.cuda.nccl.is_available(torch.randn(1).cuda())
torch.cuda.nccl.version()

return?

1 Like
C:\Users\user\anaconda3\lib\site-packages\torch\cuda\nccl.py:16: UserWarning: PyTorch is not compiled with NCCL support
  warnings.warn('PyTorch is not compiled with NCCL support')
False
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    print(torch.cuda.nccl.version())
  File "C:\Users\user\anaconda3\lib\site-packages\torch\cuda\nccl.py", line 36, in version
    return torch._C._nccl_version()
AttributeError: module 'torch._C' has no attribute '_nccl_version'

It looks like I dont have nccl, But I did try downloading it (cuda 11.1 compatible version), and the download is of .txz and inside is a library, so I tried pasting it to “C:\Users\user\anaconda3\Lib\site-packages” , but it didnt work.
I also tried downloading it from the anaconda prompt, but I got an error:

PackagesNotFoundError: The following packages are not available from current channels:

  - nccl

Current channels:

  - https://conda.anaconda.org/conda-forge/label/cf202003/win-64
  - https://conda.anaconda.org/conda-forge/label/cf202003/noarch
  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

THANK YOU SO MUCH FOR YOUR HELP!!! YOU ARE A LIFE SAVER!

I cannot reproduce the issue using the 1.8.0+cu111 wheels and see a valid NCCL version, so I’m unsure where the wheel you are using comes from.
Install:

pip install torch==1.8.0+cu111 --index-url https://download.pytorch.org/whl/cu111
Looking in indexes: https://download.pytorch.org/whl/cu111
Collecting torch==1.8.0+cu111
  Downloading https://download.pytorch.org/whl/cu111/torch-1.8.0%2Bcu111-cp38-cp38-linux_x86_64.whl (1982.2 MB)
...

Code:

>>> import torch
>>> torch.__version__
'1.8.0+cu111'
>>> torch.cuda.nccl.is_available(torch.randn(1).cuda())
True
>>> torch.cuda.nccl.version()
2708
1 Like

Ok I understand the issue, I am using nccl on windows which isnt supported, I switched to gloo but I am still getting an error:

OSError: [WinError 1455] The paging file is too small for this operation to complete

I tried changing my virtual memory to be bigger but I still get this error.
With one gpu It runs for 10 minutes or so until I get this error, but with 2 gpus I cant even run the code.

also, how can I run nccl in linux? maybe its better to try that?

Yes, this explains why NCCL isn’t available in your setup.
To use NCCL on Linux refer to e.g. this simple example.
I don’t know if it’s better as I’m not using Windows and don’t have any feedback on e.g. gloo/Windows.

Having a similar issue with PyTorch 2.0.1 and CUDA 11.5. Would appreciate your guidance.
Do we need to uninstall PyTorch, then install a version compiled with CUDA? (Should we try the following? pip install torch==2.0.1+cu105 --index-url https://download.pytorch.org/whl/cu105)
Thanks!


python -m torch.utils.collect_env:

Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.17

Python version: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:58:44) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration:
GPU 0: Tesla V100-DGXS-32GB
GPU 1: Tesla V100-DGXS-32GB

Nvidia driver version: 495.29.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    20
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               1299.804
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              4397.77
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              51200K
NUMA node0 CPU(s):     0-39
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.0.1
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.5.1              h59c8dcf_11    conda-forge
[conda] mkl                       2023.1.0         h6d00ec8_46342
[conda] numpy                     1.24.3          py310ha4c1d20_0    conda-forge
[conda] pytorch                   2.0.1              py3.10_cpu_0    pytorch
[conda] pytorch-cuda              11.8                 h7e8668a_5    pytorch
[conda] pytorch-mutex             1.0                         cpu    pytorch
[conda] torch                     2.0.1                    pypi_0    pypi

nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-DGXS...  Off  | 00000000:07:00.0 Off |                    0 |
| N/A   32C    P0    51W / 300W |      0MiB / 32508MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-DGXS...  Off  | 00000000:08:00.0 Off |                    0 |
| N/A   31C    P0    48W / 300W |      0MiB / 32508MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

>>> import torch
>>> torch.__version__
'2.0.1'
>>> torch.cuda.nccl.is_available(torch.randn(1).cuda())
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 torch.cuda.nccl.is_available(torch.randn(1).cuda())

File ~/.conda/envs/bigcode-evaluation-harness-env/lib/python3.10/site-packages/torch/cuda/__init__.py:239, in _lazy_init()
    235     raise RuntimeError(
    236         "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
    237         "multiprocessing, you must use the 'spawn' start method")
    238 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 239     raise AssertionError("Torch not compiled with CUDA enabled")
    240 if _cudart is None:
    241     raise AssertionError(
    242         "libcudart functions unavailable. It looks like you have a broken build?")

AssertionError: Torch not compiled with CUDA enabled

You are trying to install a PyTorch binary with CUDA 10.5, which does not exist:

pip install torch==2.0.1+cu105 --index-url https://download.pytorch.org/whl/cu105

You can also visit the URL and will see that it does not exist.
Select a valid CUDA version from our install matrix here, copy/paste the install command into your terminal, and execute it.

The 2.0.1 release ships with CUDA 11.7 and 11.8 while the nightly release ships with 11.8 and 12.1.

1 Like

Hello, I also have the same issue with PyTorch 2.0.1 and CUDA 11.8. Please help me solve this problem. This is my error and my output:

RuntimeError: Distributed package doesn’t have NCCL built in

python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home Single Language
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 17:59:51) [MSC v.1935 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1050 Ti
Nvidia driver version: 531.14
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2400
DeviceID=CPU0
Family=205
L2CacheSize=1024
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2401
Name=Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.25.0
[pip3] pytorchvideo==0.1.5
[pip3] torch==2.0.1+cu118
[pip3] torchaudio==2.0.2+cu118
[pip3] torchvision==0.15.2
[conda] libblas                   3.9.0              17_win64_mkl    conda-forge
[conda] libcblas                  3.9.0              17_win64_mkl    conda-forge
[conda] liblapack                 3.9.0              17_win64_mkl    conda-forge
[conda] mkl                       2022.1.0           h6a75c08_874    conda-forge
[conda] numpy                     1.25.0          py311h0b4df5a_0    conda-forge
[conda] pytorchvideo              0.1.5                    pypi_0    pypi
[conda] torch                     2.0.1+cu118              pypi_0    pypi
[conda] torchaudio                2.0.2+cu118              pypi_0    pypi
[conda] torchvision               0.15.2                   pypi_0    pypi

NCCL is not available on Windows and given you are using only a single GPU there also won’t be a need for it.

@ptrblck, you’ve been so helpful with previous posters, wondering if I can get your eye on this.

I am also receiving this error:

RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 18270) of binary

python -m torch.utils.collect_env returns

PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3 (x86_64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.12 (main, Jul  5 2023, 15:34:07) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Max

Versions of relevant libraries:
[pip3] numpy==1.25.0
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] blas                      1.0                         mkl  
[conda] ffmpeg                    4.3                  h0a44026_0    pytorch
[conda] mkl                       2023.1.0         h59209a4_43558  
[conda] mkl-service               2.4.0           py310h6c40b1e_1  
[conda] mkl_fft                   1.3.6           py310h3ea8b11_1  
[conda] mkl_random                1.2.2           py310h3ea8b11_1  
[conda] numpy                     1.25.1                   pypi_0    pypi
[conda] numpy-base                1.25.0          py310ha186be2_0  
[conda] pytorch                   2.0.1                  py3.10_0    pytorch
[conda] torchaudio                2.0.2                    pypi_0    pypi
[conda] torchvision               0.15.2                py310_cpu    pytorch

Also,
torch.cuda.nccl.is_available(torch.randn(1).cuda()) torch.cuda.nccl.version() returns

-bash: syntax error near unexpected token `torch.randn'

Mac does neither support CUDA anymore nor NCCL so the error is expected.

1 Like

python -m torch.utils.collect_env:
Collecting environment information…
PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home Single Language
GCC version: (MinGW.org GCC-6.3.0-1) 6.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.4 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 13:47:18) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1650
Nvidia driver version: 522.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=1700
DeviceID=CPU0
Family=205
L2CacheSize=4096
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=1700
Name=12th Gen Intel(R) Core™ i5-1240P
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.25.1
[pip3] torch==2.0.1+cu118
[pip3] torchaudio==2.0.2+cu118
[pip3] torchvision==0.15.2+cu118
[conda] numpy 1.25.1 pypi_0 pypi
[conda] torch 2.0.1+cu118 pypi_0 pypi
[conda] torchaudio 2.0.2+cu118 pypi_0 pypi
[conda] torchvision 0.15.2+cu118 pypi_0 pypi

i am also having same issue please help Sir