RuntimeError: Distributed package doesn't have NCCL built in

ptrblck · March 14, 2024, 1:03pm

You might want to check this post.

Neelam_Jain · March 15, 2024, 4:21am

Thanks… I think you said it should be ok if we are using single GPU. In my case i am using single GPU. Hence, it should work.

zhy_zhy · March 19, 2024, 3:34am

hi，, I encountered the same issue with Windows not supporting NCCL. I only want to use a single GPU, but I don’t know how to resolve it. Here is the relevant information. Can you provide me with a solution?

Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 专业版
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.3.103
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 551.76
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==1.13.1+cu117
[pip3] torchaudio==0.13.1+cu117
[pip3] torchvision==0.14.1+cu117
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.8.0               hd77b12b_0
[conda] mkl                       2023.1.0         h6b88ed4_46358
[conda] mkl-service               2.4.0            py39h2bbff1b_1
[conda] mkl_fft                   1.3.8            py39h2bbff1b_0
[conda] mkl_random                1.2.4            py39h59b6b97_0
[conda] numpy                     1.26.4           py39h055cbcc_0
[conda] numpy-base                1.26.4           py39h65a83cf_0
[conda] pytorch-mutex             1.0                         cpu    pytorch
[conda] torch                     1.13.1+cu117             pypi_0    pypi
[conda] torchaudio                0.13.1+cu117             pypi_0    pypi
[conda] torchvision               0.14.1+cu117             pypi_0    pypi

torch.cuda.nccl.is_available(torch.randn(1).cuda())

D:\anaconda\envs\McQuic_1\lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
  warnings.warn('PyTorch is not compiled with NCCL support')
False

eTuDpy · April 15, 2024, 8:44am

I have the same issue. This is my output.

Collecting environment information…
PyTorch version: 2.2.2+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Microsoft Windows Server 2019 Standard
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:24:38) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.17763-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 472.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2101
DeviceID=CPU0
Family=179
L2CacheSize=16384
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2101
Name=Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
ProcessorType=3
Revision=21767

Architecture=9
CurrentClockSpeed=2101
DeviceID=CPU1
Family=179
L2CacheSize=16384
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2101
Name=Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
ProcessorType=3
Revision=21767

Versions of relevant libraries:
[pip3] flake8==6.1.0
[pip3] numpy==1.26.3
[pip3] torch==2.2.2+cu118
[pip3] torchaudio==2.2.2+cu118
[pip3] torchvision==0.17.2+cu118
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6b88ed4_46358
[conda] mkl-service 2.4.0 py310h2bbff1b_1
[conda] mkl_fft 1.3.8 py310h2bbff1b_0
[conda] mkl_random 1.2.4 py310h59b6b97_0
[conda] numpy 1.26.3 py310h055cbcc_0
[conda] numpy-base 1.26.3 py310h65a83cf_0
[conda] torch 2.2.2+cu118 pypi_0 pypi
[conda] torchaudio 2.2.2+cu118 pypi_0 pypi
[conda] torchvision 0.17.2+cu118 pypi_0 pypi

As far as I can tell, this is an Windows issue, isnt it? The line:

torch.cuda.nccl.is_available(torch.randn(1).cuda())

also returns False.

Aicha_FATIMI · April 29, 2024, 12:31pm

I have the same issue.

python -m torch.utils.collect_env
output : Collecting environment information…
PyTorch version: 1.8.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows Server 2022 Datacenter
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.9 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla T4
GPU 1: Tesla T4
GPU 2: Tesla T4
GPU 3: Tesla T4

Nvidia driver version: 551.78
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.23.0
[pip3] torch==1.8.0+cu111
[pip3] torchaudio==2.2.0.dev20240426+cu121
[pip3] torchmetrics==0.8.0
[pip3] torchvision==0.19.0.dev20240426+cu121
[conda] _anaconda_depends 2024.02 py311_mkl_1
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 pypi_0 pypi
[conda] mkl-service 2.4.0 py311h2bbff1b_1
[conda] mkl_fft 1.3.8 py311h2bbff1b_0
[conda] mkl_random 1.2.4 py311h59b6b97_0
[conda] numpy 1.26.4 py311hdab7c0b_0
[conda] numpy-base 1.26.4 py311hd01c5d8_0
[conda] numpydoc 1.5.0 py311haa95532_0
[conda] pytorch 2.2.2 py3.11_cuda11.8_cudnn8_0 pytorch
[conda] pytorch-cuda 11.8 h24eeafa_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 2.4.0.dev20240421+cu121 pypi_0 pypi
[conda] torchaudio 2.2.0.dev20240421+cu121 pypi_0 pypi
[conda] torchvision 0.17.2 pypi_0 pypi

import torch.cuda.nccl
torch.cuda.nccl.is_available(torch.randn(1).cuda())
output : False

ptrblck · April 29, 2024, 1:52pm

You are running into the same issue as above.

Adel_Zare · June 30, 2024, 6:02pm

Hi dear
I have same issue to run this this code . https://github.com/12wang3/rrl
Here’s my environment information :

<frozen runpy>:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home Single Language
GCC version: (Rev6, Built by MSYS2 project) 13.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU
Nvidia driver version: 555.99
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2400
DeviceID=CPU0
Family=198
L2CacheSize=4096
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2400
Name=13th Gen Intel(R) Core(TM) i7-13700H
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.8.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.5.0
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1+cu121
[pip3] torchvision==0.18.1
[conda] Could not collect

And I just using single GPU but raise a same error.

raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in

Also when I run the code below :

import torch
torch.cuda.nccl.is_available(torch.randn(1).cuda())

The output is :

G:\Program Files\anaconda3\Lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
  warnings.warn("PyTorch is not compiled with NCCL support")
False

Thanks for your help

ptrblck · June 30, 2024, 9:29pm

Same as above: use Gloo on Windows machines or disable the distributed usage.

Adel_Zare · June 30, 2024, 10:01pm

When I use Gloo for backend in train_model function in experiment.py file :

def train_model(gpu, args):
    rank = args.nr * args.gpus + gpu
    dist.init_process_group(backend='gloo', init_method='env://', world_size=args.world_size, rank=rank)

and got new error :

(base) PS C:\Users\ASUS\Downloads\rrl-main\rrl-main> python experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -i 0 -wd 0.0001 --print_rule
[W socket.cpp:697] [c10d] The client socket has failed to connect to [Romina]:58753 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
  File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\experiment.py", line 174, in <module>
    train_main(rrl_args)
  File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\experiment.py", line 167, in train_main
    mp.spawn(train_model, nprocs=args.gpus, args=(args,))
  File "G:\Program Files\anaconda3\Lib\site-packages\torch\multiprocessing\spawn.py", line 281, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\Program Files\anaconda3\Lib\site-packages\torch\multiprocessing\spawn.py", line 237, in start_processes
    while not context.join():
              ^^^^^^^^^^^^^^
  File "G:\Program Files\anaconda3\Lib\site-packages\torch\multiprocessing\spawn.py", line 188, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "G:\Program Files\anaconda3\Lib\site-packages\torch\multiprocessing\spawn.py", line 75, in _wrap
    fn(i, *args)
  File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\experiment.py", line 70, in train_model
    db_enc, train_loader, valid_loader, _ = get_data_loader(dataset, args.world_size, rank, args.batch_size,
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\experiment.py", line 26, in get_data_loader
    db_enc.fit(X_df, y_df)
  File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\rrl\utils.py", line 59, in fit
    self.y_fname = list(self.label_enc.get_feature_names(y_df.columns)) if self.y_one_hot else y_df.columns
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'OneHotEncoder' object has no attribute 'get_feature_names'

and I don’t know how to disable the distributed usage.

Adel_Zare · June 30, 2024, 10:42pm

I just renamed ‘get_feature_names’ to '‘get_feature_names_out’ and solve my issue !
Thanks

Abhishek · September 6, 2024, 10:59am

I am getting error
raise RuntimeError("Distributed package doesn’t have NCCL " “built in”)
RuntimeError: Distributed package doesn’t have NCCL built in

i am in windows 11
12 gb nvidia rtx 3060 gpu
how i can resolve it…

CUDA available: True
CUDA version in PyTorch: 11.3
PyTorch version: 1.12.1+cu113
PyTorch version: 1.12.1+cu113
GPU is available

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-H0Q7GBK]:57324 (system error: 10049 - The requested address is not valid in its context.).
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-H0Q7GBK]:57324 (system error: 10049 - The requested address is not valid in its context.).
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-H0Q7GBK]:57324 (system error: 10049 - The requ[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-H0Q7GBK]:57324 (system error: 10049 - The requested address is not valid in its context.).
[2024-09-06 16:22:21,118][main][CRITICAL] - Training failed due to Distributed package doesn’t have NCCL built in:
ested address is not valid in its context.).
[2024-09-06 16:22:21,118][main][CRITICAL] - Training failed due to Distributed package doesn’t have NCCL built in:
Traceback (most recent call last):
File “bin/train.py”, line 83, in main
Traceback (most recent call last):
File “bin/train.py”, line 83, in main
trainer.fit(training_model)
trainer.fit(training_model)
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\trainer\trainer.py”, line 496, in fit
self.pre_dispatch()
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\trainer\trainer.py”, line 496, in fit
self.pre_dispatch()
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\trainer\trainer.py”, line 525, in pre_dispatch
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\trainer\trainer.py”, line 525, in pre_dispatch
self.accelerator.pre_dispatch()
self.accelerator.pre_dispatch()
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\accelerators\accelerator.py”, line 83, in pre_dispat File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\accelerators\accelerator.py”, line 83, in pre_dispatch
self.training_type_plugin.pre_dispatch()
self.training_type_plugin.pre_dispatch()
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\plugins\training_type\ddp.py”, line 258, in pre_dispatch
self.init_ddp_connection(self.global_rank, self.world_size)
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\plugins\training_type\ddp.py”, line 258, in pre_dispatch
self.init_ddp_connection(self.global_rank, self.world_size)
self.init_ddp_connection(self.global_rank, self.world_size)
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\plugins\training_type\ddp.py”, line 241, in init_ddp File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\plugins\training_type\ddp.py”, line 241, in init_ddp_connection
torch_distrib.init_process_group(self.torch_distributed_backend, rank=global_rank, world_size=world_size)
_connection
torch_distrib.init_process_group(self.torch_distributed_backend, rank=global_rank, world_size=world_size)
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\distributed\distributed_c10d.py”, line 602, in init_process_group
default_pg = _new_process_group_helper(
File “C:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\distributed\distributed_c10d.py”, line 727, in _new_process_group_helper
raise RuntimeError("Distributed package doesn’t have NCCL " “built in”)
RuntimeError: Distributed package doesn’t have NCCL built in

code is…


        trainer = Trainer(
            # there is no need to suppress checkpointing in ddp, because it handles rank on its own
            callbacks=ModelCheckpoint(dirpath=checkpoints_dir, **config.trainer.checkpoint_kwargs),
            logger=metrics_logger,
            default_root_dir=os.getcwd(),
            **trainer_kwargs
        )
        trainer.fit(training_model)

ptrblck · September 6, 2024, 12:59pm

Same as above: use Gloo on Windows machines or disable the distributed usage.