RuntimeError: Distributed package doesn't have NCCL built in

Same issue as before:

Tried to launch llama from meta 2nd day =\ any advice? But get error:

RuntimeError: Distributed package doesn't have NCCL built in
python -m torch.utils.collect_env
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: N/A

Python version: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 10.1.243
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060 6GB
Nvidia driver version: 536.67
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3501
DeviceID=CPU0
Family=205
L2CacheSize=1024
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3501
Name=Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
ProcessorType=3
Revision=15363

Versions of relevant libraries:
[pip3] numpy==1.25.1
[pip3] torch==2.0.1+cu117
[pip3] torchaudio==2.0.2+cu117
[pip3] torchvision==0.15.2+cu117
[conda] Could not collect
>>> import torch
>>> torch.cuda.nccl.is_available(torch.randn(1).cuda())
D:\pythonProjects\llama\env2\lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
  warnings.warn('PyTorch is not compiled with NCCL support')
False
>>> torch.cuda.is_available()
True

Again the same issue as the two before you. Check my previous post.

1 Like

I have the same problem, but even worse. When I try to check pytorch environment, I got this:
$ python -m torch.utils.collect_env
Collecting environment information…
Traceback (most recent call last):
File “C:\Program Files\Python310\lib\runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “C:\Program Files\Python310\lib\runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 602, in
main()
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 585, in main
output = get_pretty_env_info()
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 580, in get_pretty_env_info
return pretty_str(get_env_info())
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 422, in get_env_info
pip_version, pip_list_output = get_pip_packages(run_lambda)
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 394, in get_pip_packages
out = run_with_pip(sys.executable + ’ -mpip’)
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 382, in run_with_pip
for line in out.splitlines()
AttributeError: ‘NoneType’ object has no attribute ‘splitlines’

Please help. Thanks a lot :slight_smile:

Hello,

I also have

RuntimeError: Distributed package doesn’t have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 37602) of binary: /home/user/miniconda3/envs/myenv/bin/python

Environment is Ubuntu 22.04 on WSL2:

PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00)  [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Quadro K2200
Nvidia driver version: 516.94
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   46 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          12
On-line CPU(s) list:             0-11
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
CPU family:                      6
Model:                           63
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
Stepping:                        2
BogoMIPS:                        6983.82
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt flush_l1d arch_capabilities
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       192 KiB (6 instances)
L1i cache:                       192 KiB (6 instances)
L2 cache:                        1.5 MiB (6 instances)
L3 cache:                        15 MiB (1 instance)
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] blas                      1.0                         mkl
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-service               2.4.0            py38h95df7f1_0    conda-forge
[conda] mkl_fft                   1.3.1            py38h8666266_1    conda-forge
[conda] mkl_random                1.2.2            py38h1abd341_0    conda-forge
[conda] numpy                     1.24.3           py38h14f4228_0
[conda] numpy-base                1.24.3           py38h31eccc5_0
[conda] pytorch                   2.0.1           py3.8_cuda11.7_cudnn8.5.0_0    pytorch
[conda] pytorch-cuda              11.7                 h778d358_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                2.0.2                py38_cu117    pytorch
[conda] torchtriton               2.0.0                      py38    pytorch
[conda] torchvision               0.15.2               py38_cu117    pytorch

You also won’t need NCCL for single-GPU workloads as already described, so don’t call into it.

Well, I am not calling it.
Meta AI is.
I suggest you do consult with them, otherwise, you will be facing a hell lot more posts like this.
This command from here: https://github.com/facebookresearch/codellama/blob/main/README.md

torchrun --nproc_per_node 1 example_infilling.py \
    --ckpt_dir CodeLlama-7b/ \
    --tokenizer_path CodeLlama-7b/tokenizer.model \
    --max_seq_len 192 --max_batch_size 4

is directly leading to the error with the recommended torch setup.

However, it works when installed with this command:
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

I would recommend to check the code you are running instead and to understand why distributed calls are used, e.g. if model sharding is a requirement etc.
If not, feel free to create an issue in the corresponding repositories so that the authors could fix it.

python -m torch.utils.collect_env

PyTorch version: 2.1.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.3 | packaged by Anaconda, Inc. | (main, Apr 19 2023, 23:46:34) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000

Nvidia driver version: 536.96
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3400
DeviceID=CPU0
Family=179
L2CacheSize=40960
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3400
Name=Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
ProcessorType=3
Revision=27142

Architecture=9
CurrentClockSpeed=3400
DeviceID=CPU1
Family=179
L2CacheSize=40960
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3400
Name=Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
ProcessorType=3
Revision=27142

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.24.3
[pip3] numpydoc==1.5.0
[pip3] pytorch-lightning==2.1.2
[pip3] torch==2.1.0+cu121
[pip3] torchmetrics==1.2.0
[pip3] torchvision==0.16.1
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h8bd8f75_46356
[conda] mkl-service 2.4.0 py311h2bbff1b_1
[conda] mkl_fft 1.3.6 py311hf62ec03_1
[conda] mkl_random 1.2.2 py311hf62ec03_1
[conda] numpy 1.24.3 py311hdab7c0b_1
[conda] numpy-base 1.24.3 py311hd01c5d8_1
[conda] numpydoc 1.5.0 py311haa95532_0

trainer = pl.Trainer(
devices=[0],
accelerator=“cuda”,
plugins=custom_ckpt,
max_epochs=config.max_epochs,
max_steps=config.max_steps,
val_check_interval=config.val_check_interval,
check_val_every_n_epoch=config.check_val_every_n_epoch,
gradient_clip_val=config.gradient_clip_val,
precision=‘bf16-mixed’,
num_sanity_val_steps=0,
logger=logger,
callbacks=[lr_callback, checkpoint_callback, bar],
)

Hi. I have this problem.
No issue but loss is error.
Please help me this problem.

You are also using Windows, which does not support NCCL, so use a supported backend or a single GPU.

Thank you for answering. so I tried to use only single GPU.
But loss between cpu and Gpu is different.
What is wrong?

Hi i have same problem i have pc windows 11 and one gpu if i run output like this
[W socket.cpp:663] [c10d] The client socket has failed to connect to [::]:5000 (system error: 10049 - The requested address is not valid in its context.).
Error executing job with overrides: [‘exp_manager.name=first_model’, ‘exp_manager.resume_if_exists=true’, ‘exp_manager.resume_ignore_no_checkpoint=true’, ‘exp_manager.exp_dir=C:\Users\mahdy\PycharmProjects\stt_api\results’,
‘model.tokenizer.dir=C:\Users\mahdy\PycharmProjects\stt_api\tokenizer_spe_bpe_v1024_max_4\tokenizer_spe_bpe_v1024_max_4’, ‘model.train_ds.is_tarred=true’, ‘model.train_ds.tarred_audio_filepaths=C:\Users\mahdy\PycharmPr
ojects\stt_api\train_tarred_1bk\audio__OP_0…1023_CL_.tar’, ‘model.train_ds.manifest_filepath=C:\Users\mahdy\PycharmProjects\stt_api\train_tarred_1bk\tarred_audio_manifest.json’, ‘model.validation_ds.manifest_filepath=C:\Users\mahdy\PycharmProjects\stt_api\validated_decoded_processed.json’, ‘model.test_ds.manifest_filepath=C:\Users\mahdy\PycharmProjects\stt_api\test_decoded_processed.json’]
Traceback (most recent call last):
File “C:\Users\mahdy\PycharmProjects\stt_api\speech_to_text_ctc_bpe.py”, line 87, in main
trainer.fit(asr_model)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 532, in fit
call._call_and_handle_interrupt(
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\trainer\call.py”, line 42, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py”, line 93, in launch
return function(*args, **kwargs)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 938, in _run
self.strategy.setup_environment()
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\strategies\ddp.py”, line 143, in setup_environment
self.setup_distributed()
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\strategies\ddp.py”, line 192, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\lightning_fabric\utilities\distributed.py”, line 258, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\torch\distributed\c10d_logger.py”, line 74, in wrapper
func_return = func(*args, **kwargs)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\torch\distributed\distributed_c10d.py”, line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\torch\distributed\distributed_c10d.py”, line 1268, in _new_process_group_helper
raise RuntimeError(“Distributed package doesn’t have NCCL built in”)
RuntimeError: Distributed package doesn’t have NCCL built in

How i can fix Please help me.
I use it for the first time. To train model stt

I use cuda 11.8
Torch 2.1.2

Please someone help me

The same applies to you, too:

Hi, I ran python -m torch.utils.collect_env as suggested above and got this but cannot understand why I am still getting an NCCL is not available as I have a cuda version of pytorch installed. Any help would be appreciated.

Collecting environment information...
PyTorch version: 2.2.0
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (aarch64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 22 2023, 10:22:35)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.10.104-tegra-aarch64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.6.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          12
On-line CPU(s) list:             0-7
Off-line CPU(s) list:            8-11
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       2
Vendor ID:                       ARM
Model:                           1
Model name:                      ARMv8 Processor rev 1 (v8l)
Stepping:                        r0p1
CPU max MHz:                     2201.6001
CPU min MHz:                     115.2000
BogoMIPS:                        62.50
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        2 MiB
L3 cache:                        4 MiB
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm

Versions of relevant libraries:
[pip3] numpy==1.24.4
[pip3] torch==2.2.0
[pip3] torchaudio==2.2.0
[pip3] torchvision==0.17.0
[conda] numpy                     1.26.1                   pypi_0    pypi
[conda] numpy-base                1.26.3          py311h592f769_0
[conda] pytorch                   2.2.0           cpu_py311h150d335_0
[conda] pytorch-cuda              11.8                 h8dd9ede_2    pytorch
[conda] torchvision               0.15.2          cpu_py311h96b1cb9_0

You’ve installed a CPU-only version:

[conda] pytorch                   2.2.0           cpu_py311h150d335_0

on a Tegra device.

So if I uninstall that version it should work, right?

Me also facing the same error. Here are the environment details.
(base) D:\Shailender\Anaconda\model\llama>python -m torch.utils.collect_env
:128: RuntimeWarning: ‘torch.utils.collect_env’ found in sys.modules after import of package ‘torch.utils’, but prior to execution of ‘torch.utils.collect_env’; this may result in unpredictable behaviour
Collecting environment information…
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060
Nvidia driver version: 551.61
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3600
DeviceID=CPU0
Family=107
L2CacheSize=3072
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=3600
Name=AMD Ryzen 5 3600 6-Core Processor
ProcessorType=3
Revision=28928

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] numpydoc==1.5.0
[pip3] torch==2.2.1+cu121
[pip3] torchaudio==2.2.1+cu121
[pip3] torchvision==0.17.1+cu121
[conda] _anaconda_depends 2023.09 py311_mkl_1
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6b88ed4_46357
[conda] mkl-service 2.4.0 py311h2bbff1b_1
[conda] mkl_fft 1.3.8 py311h2bbff1b_0
[conda] mkl_random 1.2.4 py311h59b6b97_0
[conda] numpy 1.24.3 py311hdab7c0b_1
[conda] numpy-base 1.24.3 py311hd01c5d8_1
[conda] numpydoc 1.5.0 py311haa95532_0
[conda] torch 2.2.1+cu121 pypi_0 pypi
[conda] torchaudio 2.2.1+cu121 pypi_0 pypi
[conda] torchvision 0.17.1+cu121 pypi_0 pypi

I am getting the following error

(base) D:\Shailender\Anaconda\model\llama>torchrun --nproc_per_node 1 example_chat_completion1.py --ckpt_dir llama-2-7b-chat\llama-2-7b-chat --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6
[2024-03-14 13:26:33,922] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[W socket.cpp:697] [c10d] The client socket has failed to connect to [DESKTOP-94U06FB]:29500 (system error: 10049 - The requested address is not valid in its context.).
D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\distributed_c10d.py:608: UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
warnings.warn(“Attempted to get default timeout for nccl backend, but NCCL support is not compiled”)
[W socket.cpp:697] [c10d] The client socket has failed to connect to [DESKTOP-94U06FB]:29500 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File “D:\Shailender\Anaconda\model\llama\example_chat_completion1.py”, line 63, in
fire.Fire(main)
File “D:\Shailender\Anaconda\Lib\site-packages\fire\core.py”, line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Shailender\Anaconda\Lib\site-packages\fire\core.py”, line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File “D:\Shailender\Anaconda\Lib\site-packages\fire\core.py”, line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File “D:\Shailender\Anaconda\model\llama\example_chat_completion1.py”, line 35, in main
generator = Llama.build(
^^^^^^^^^^^^
File “D:\Shailender\Anaconda\model\llama\llama\generation.py”, line 85, in build
torch.distributed.init_process_group(“nccl”)
File “D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\c10d_logger.py”, line 86, in wrapper
func_return = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\distributed_c10d.py”, line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\distributed_c10d.py”, line 1302, in _new_process_group_helper
raise RuntimeError(“Distributed package doesn’t have NCCL built in”)
RuntimeError: Distributed package doesn’t have NCCL built in
[2024-03-14 13:26:38,965] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 14360) of binary: D:\Shailender\Anaconda\python.exe
Traceback (most recent call last):
File “”, line 198, in run_module_as_main
File “”, line 88, in run_code
File "D:\Shailender\Anaconda\Scripts\torchrun.exe_main
.py", line 7, in
File "D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors_init
.py", line 347, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File “D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\run.py”, line 812, in main
run(args)
File “D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\run.py”, line 803, in run
elastic_launch(
File “D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\launcher\api.py”, line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\Shailender\Anaconda\Lib\site-packages\torch\distributed\launcher\api.py”, line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_chat_completion1.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-03-14_13:26:38
host : DESKTOP-94U06FB
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 14360)
error_file: <N/A>
traceback : To enable traceback see: Error Propagation — PyTorch 2.2 documentation