RuntimeError: Distributed package doesn't have NCCL built in

I am trying to finetune a ProtGPT-2 model using the following libraries and packages:
image

I am running my scripts in a cluster with SLURM as workload manager and Lmod as environment modul systerm, I also have created a conda environment, installed all the dependencies that I need from Transformers HuggingFace. The cluster also has multiple GPUs and CUDA v 11.7.
However, when I run my script to train the model I got the following error:

 File "protGPT_trainer.py", line 475, in <module>
    main()
  File "protGPT_trainer.py", line 438, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
    return inner_training_loop(
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/transformers/trainer.py", line 1702, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 257, in __init__
    dist.init_distributed(dist_backend=self.dist_backend,
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 656, in init_distributed
    cdb = TorchBackend(dist_backend, timeout, init_method, rank, world_size)
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 36, in __init__
    self.init_process_group(backend, timeout, init_method, rank, world_size)
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 40, in init_process_group
    torch.distributed.init_process_group(backend,
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 602, in init_process_group
    default_pg = _new_process_group_helper(
  File "/home/user/miniconda3/envs/gptenv/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 727, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in

My script to run the training is this:

I also have been checking some related cases in GitHub, Stack, and PyTorch forums, but most of them don’t have a clear answer.
I’d like to know if there is a solution for this error, and how can I face it?

I am new in this topic, so I will answer additional questions to clarify the case.

Could you post the output of python -m torch.utils.collect_env, please, as it seems you might have installed a PyTorch binary without NCCL support (so maybe the CPU-only binary).

Hello! Thank you for answering. Probably I have installed something wrong, I am learning how to work with PyTorch.
Here is the output:

Collecting environment information…
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: CentOS Stream 8 (x86_64)
GCC version: (GCC) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8.0 (default, Nov 6 2019, 21:49:08) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.18.0-240.22.1.el8_3.x86_64-x86_64-with-glibc2.10
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.12.1
[conda] _tflow_select 2.3.0 mkl
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] numpy 1.23.5 py38h14f4228_0
[conda] numpy-base 1.23.5 py38h31eccc5_0
[conda] pytorch 1.12.1 cpu_py38hb1f1ab4_1
[conda] tensorflow 2.4.1 mkl_py38hb2083e0_0
[conda] tensorflow-base 2.4.1 mkl_py38h43e0292_0

Will be great if you can give me some advices how to deal with this.

Yes, you have installed the CPU-only conda binary:

pytorch 1.12.1 cpu_py38hb1f1ab4_1

as already speculated.
You can select the desired CUDA version from the install matrix here, copy/paste the command, and it should install the right binaries.

1 Like

Thank you so much! It solved my error!

Hi. I have this problem. Here is my output from your given CLI @ptrblck

❯ arch=arm64 python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.0.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.2 (x86_64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: version 3.25.0
Libc version: N/A

Python version: 3.10.11 (main, Apr 20 2023, 16:12:28) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime)
Python platform: macOS-13.2-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2 Max

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.0.0
[pip3] torchaudio==2.0.1
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.15.1
[conda] Could not collect

How to solve this problem?

Based on the posted environment it seems you are using an Apple M2 Max, which neither supports NCCL nor CUDA and also doesn’t use multiple GPUs as it’s a Laptop, so I’m unsure what your exact use case is and why you want to use NCCL.
Could you explain your use case a bit more, please?

@ptrblck

Hey, I am having the same issue, please help me :pray::
This is my output from “python -m torch.utils.collect_env”

PyTorch version: 1.8.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA RTX A2000
GPU 1: NVIDIA GeForce RTX 3060 Ti

Nvidia driver version: 531.41
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.8.0+cu111
[pip3] torchvision==0.9.0+cu111
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 haa95532_640
[conda] mkl-service 2.4.0 py38h2bbff1b_0
[conda] mkl_fft 1.3.1 py38h277e83a_0
[conda] mkl_random 1.2.2 py38hf11a4ad_0
[conda] numpy 1.23.5 py38h3b20f71_0
[conda] numpy-base 1.23.5 py38h4da318b_0
[conda] torch 1.8.0+cu111 pypi_0 pypi
[conda] torchvision 0.9.0+cu111 pypi_0 pypi

I would also say I dont want to update the CUDA nor the pytorch versions.

THANK YOU!!!

What does:

torch.cuda.nccl.is_available(torch.randn(1).cuda())
torch.cuda.nccl.version()

return?

1 Like
C:\Users\user\anaconda3\lib\site-packages\torch\cuda\nccl.py:16: UserWarning: PyTorch is not compiled with NCCL support
  warnings.warn('PyTorch is not compiled with NCCL support')
False
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    print(torch.cuda.nccl.version())
  File "C:\Users\user\anaconda3\lib\site-packages\torch\cuda\nccl.py", line 36, in version
    return torch._C._nccl_version()
AttributeError: module 'torch._C' has no attribute '_nccl_version'

It looks like I dont have nccl, But I did try downloading it (cuda 11.1 compatible version), and the download is of .txz and inside is a library, so I tried pasting it to “C:\Users\user\anaconda3\Lib\site-packages” , but it didnt work.
I also tried downloading it from the anaconda prompt, but I got an error:

PackagesNotFoundError: The following packages are not available from current channels:

  - nccl

Current channels:

  - https://conda.anaconda.org/conda-forge/label/cf202003/win-64
  - https://conda.anaconda.org/conda-forge/label/cf202003/noarch
  - https://repo.anaconda.com/pkgs/main/win-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/win-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/msys2/win-64
  - https://repo.anaconda.com/pkgs/msys2/noarch

THANK YOU SO MUCH FOR YOUR HELP!!! YOU ARE A LIFE SAVER!

I cannot reproduce the issue using the 1.8.0+cu111 wheels and see a valid NCCL version, so I’m unsure where the wheel you are using comes from.
Install:

pip install torch==1.8.0+cu111 --index-url https://download.pytorch.org/whl/cu111
Looking in indexes: https://download.pytorch.org/whl/cu111
Collecting torch==1.8.0+cu111
  Downloading https://download.pytorch.org/whl/cu111/torch-1.8.0%2Bcu111-cp38-cp38-linux_x86_64.whl (1982.2 MB)
...

Code:

>>> import torch
>>> torch.__version__
'1.8.0+cu111'
>>> torch.cuda.nccl.is_available(torch.randn(1).cuda())
True
>>> torch.cuda.nccl.version()
2708

Ok I understand the issue, I am using nccl on windows which isnt supported, I switched to gloo but I am still getting an error:

OSError: [WinError 1455] The paging file is too small for this operation to complete

I tried changing my virtual memory to be bigger but I still get this error.
With one gpu It runs for 10 minutes or so until I get this error, but with 2 gpus I cant even run the code.

also, how can I run nccl in linux? maybe its better to try that?

Yes, this explains why NCCL isn’t available in your setup.
To use NCCL on Linux refer to e.g. this simple example.
I don’t know if it’s better as I’m not using Windows and don’t have any feedback on e.g. gloo/Windows.