Same issue as before:
Tried to launch llama from meta 2nd day =\ any advice? But get error:
RuntimeError: Distributed package doesn't have NCCL built in
python -m torch.utils.collect_env
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: N/A
Python version: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 10.1.243
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060 6GB
Nvidia driver version: 536.67
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=3501
DeviceID=CPU0
Family=205
L2CacheSize=1024
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3501
Name=Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
ProcessorType=3
Revision=15363
Versions of relevant libraries:
[pip3] numpy==1.25.1
[pip3] torch==2.0.1+cu117
[pip3] torchaudio==2.0.2+cu117
[pip3] torchvision==0.15.2+cu117
[conda] Could not collect
>>> import torch
>>> torch.cuda.nccl.is_available(torch.randn(1).cuda())
D:\pythonProjects\llama\env2\lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn('PyTorch is not compiled with NCCL support')
False
>>> torch.cuda.is_available()
True
Again the same issue as the two before you. Check my previous post.
I have the same problem, but even worse. When I try to check pytorch environment, I got this:
$ python -m torch.utils.collect_env
Collecting environment information…
Traceback (most recent call last):
File “C:\Program Files\Python310\lib\runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “C:\Program Files\Python310\lib\runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 602, in
main()
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 585, in main
output = get_pretty_env_info()
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 580, in get_pretty_env_info
return pretty_str(get_env_info())
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 422, in get_env_info
pip_version, pip_list_output = get_pip_packages(run_lambda)
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 394, in get_pip_packages
out = run_with_pip(sys.executable + ’ -mpip’)
File “C:\Users\s1834299\AppData\Roaming\Python\Python310\site-packages\torch\utils\collect_env.py”, line 382, in run_with_pip
for line in out.splitlines()
AttributeError: ‘NoneType’ object has no attribute ‘splitlines’
Please help. Thanks a lot
Hello,
I also have
RuntimeError: Distributed package doesn’t have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 37602) of binary: /home/user/miniconda3/envs/myenv/bin/python
Environment is Ubuntu 22.04 on WSL2:
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35
Python version: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Quadro K2200
Nvidia driver version: 516.94
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
CPU family: 6
Model: 63
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 2
BogoMIPS: 6983.82
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt flush_l1d arch_capabilities
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 1.5 MiB (6 instances)
L3 cache: 15 MiB (1 instance)
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] blas 1.0 mkl
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h95df7f1_0 conda-forge
[conda] mkl_fft 1.3.1 py38h8666266_1 conda-forge
[conda] mkl_random 1.2.2 py38h1abd341_0 conda-forge
[conda] numpy 1.24.3 py38h14f4228_0
[conda] numpy-base 1.24.3 py38h31eccc5_0
[conda] pytorch 2.0.1 py3.8_cuda11.7_cudnn8.5.0_0 pytorch
[conda] pytorch-cuda 11.7 h778d358_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 2.0.2 py38_cu117 pytorch
[conda] torchtriton 2.0.0 py38 pytorch
[conda] torchvision 0.15.2 py38_cu117 pytorch
You also won’t need NCCL for single-GPU workloads as already described, so don’t call into it.
Well, I am not calling it.
Meta AI is.
I suggest you do consult with them, otherwise, you will be facing a hell lot more posts like this.
This command from here: https://github.com/facebookresearch/codellama/blob/main/README.md
torchrun --nproc_per_node 1 example_infilling.py \
--ckpt_dir CodeLlama-7b/ \
--tokenizer_path CodeLlama-7b/tokenizer.model \
--max_seq_len 192 --max_batch_size 4
is directly leading to the error with the recommended torch setup.
However, it works when installed with this command:
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
I would recommend to check the code you are running instead and to understand why distributed calls are used, e.g. if model sharding is a requirement etc.
If not, feel free to create an issue in the corresponding repositories so that the authors could fix it.
python -m torch.utils.collect_env
PyTorch version: 2.1.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.11.3 | packaged by Anaconda, Inc. | (main, Apr 19 2023, 23:46:34) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000
Nvidia driver version: 536.96
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=3400
DeviceID=CPU0
Family=179
L2CacheSize=40960
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3400
Name=Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
ProcessorType=3
Revision=27142
Architecture=9
CurrentClockSpeed=3400
DeviceID=CPU1
Family=179
L2CacheSize=40960
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3400
Name=Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
ProcessorType=3
Revision=27142
Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.24.3
[pip3] numpydoc==1.5.0
[pip3] pytorch-lightning==2.1.2
[pip3] torch==2.1.0+cu121
[pip3] torchmetrics==1.2.0
[pip3] torchvision==0.16.1
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h8bd8f75_46356
[conda] mkl-service 2.4.0 py311h2bbff1b_1
[conda] mkl_fft 1.3.6 py311hf62ec03_1
[conda] mkl_random 1.2.2 py311hf62ec03_1
[conda] numpy 1.24.3 py311hdab7c0b_1
[conda] numpy-base 1.24.3 py311hd01c5d8_1
[conda] numpydoc 1.5.0 py311haa95532_0
trainer = pl.Trainer(
devices=[0],
accelerator=“cuda”,
plugins=custom_ckpt,
max_epochs=config.max_epochs,
max_steps=config.max_steps,
val_check_interval=config.val_check_interval,
check_val_every_n_epoch=config.check_val_every_n_epoch,
gradient_clip_val=config.gradient_clip_val,
precision=‘bf16-mixed’,
num_sanity_val_steps=0,
logger=logger,
callbacks=[lr_callback, checkpoint_callback, bar],
)
Hi. I have this problem.
No issue but loss is error.
Please help me this problem.
You are also using Windows, which does not support NCCL, so use a supported backend or a single GPU.
Thank you for answering. so I tried to use only single GPU.
But loss between cpu and Gpu is different.
What is wrong?
Hi i have same problem i have pc windows 11 and one gpu if i run output like this
[W socket.cpp:663] [c10d] The client socket has failed to connect to [::]:5000 (system error: 10049 - The requested address is not valid in its context.).
Error executing job with overrides: [‘exp_manager.name=first_model’, ‘exp_manager.resume_if_exists=true’, ‘exp_manager.resume_ignore_no_checkpoint=true’, ‘exp_manager.exp_dir=C:\Users\mahdy\PycharmProjects\stt_api\results’,
‘model.tokenizer.dir=C:\Users\mahdy\PycharmProjects\stt_api\tokenizer_spe_bpe_v1024_max_4\tokenizer_spe_bpe_v1024_max_4’, ‘model.train_ds.is_tarred=true’, ‘model.train_ds.tarred_audio_filepaths=C:\Users\mahdy\PycharmPr
ojects\stt_api\train_tarred_1bk\audio__OP_0…1023_CL_.tar’, ‘model.train_ds.manifest_filepath=C:\Users\mahdy\PycharmProjects\stt_api\train_tarred_1bk\tarred_audio_manifest.json’, ‘model.validation_ds.manifest_filepath=C:\Users\mahdy\PycharmProjects\stt_api\validated_decoded_processed.json’, ‘model.test_ds.manifest_filepath=C:\Users\mahdy\PycharmProjects\stt_api\test_decoded_processed.json’]
Traceback (most recent call last):
File “C:\Users\mahdy\PycharmProjects\stt_api\speech_to_text_ctc_bpe.py”, line 87, in main
trainer.fit(asr_model)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 532, in fit
call._call_and_handle_interrupt(
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\trainer\call.py”, line 42, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py”, line 93, in launch
return function(*args, **kwargs)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 938, in _run
self.strategy.setup_environment()
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\strategies\ddp.py”, line 143, in setup_environment
self.setup_distributed()
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\pytorch_lightning\strategies\ddp.py”, line 192, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\lightning_fabric\utilities\distributed.py”, line 258, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\torch\distributed\c10d_logger.py”, line 74, in wrapper
func_return = func(*args, **kwargs)
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\torch\distributed\distributed_c10d.py”, line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File “C:\Users\mahdy\PycharmProjects\stt_api\venv\lib\site-packages\torch\distributed\distributed_c10d.py”, line 1268, in _new_process_group_helper
raise RuntimeError(“Distributed package doesn’t have NCCL built in”)
RuntimeError: Distributed package doesn’t have NCCL built in
How i can fix Please help me.
I use it for the first time. To train model stt
I use cuda 11.8
Torch 2.1.2
Please someone help me
The same applies to you, too:
Hi, I ran python -m torch.utils.collect_env
as suggested above and got this but cannot understand why I am still getting an NCCL is not available as I have a cuda version of pytorch installed. Any help would be appreciated.
Collecting environment information...
PyTorch version: 2.2.0
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (aarch64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.10 (default, Nov 22 2023, 10:22:35) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.10.104-tegra-aarch64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.6.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.6.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-7
Off-line CPU(s) list: 8-11
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
Vendor ID: ARM
Model: 1
Model name: ARMv8 Processor rev 1 (v8l)
Stepping: r0p1
CPU max MHz: 2201.6001
CPU min MHz: 115.2000
BogoMIPS: 62.50
L1d cache: 512 KiB
L1i cache: 512 KiB
L2 cache: 2 MiB
L3 cache: 4 MiB
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm
Versions of relevant libraries:
[pip3] numpy==1.24.4
[pip3] torch==2.2.0
[pip3] torchaudio==2.2.0
[pip3] torchvision==0.17.0
[conda] numpy 1.26.1 pypi_0 pypi
[conda] numpy-base 1.26.3 py311h592f769_0
[conda] pytorch 2.2.0 cpu_py311h150d335_0
[conda] pytorch-cuda 11.8 h8dd9ede_2 pytorch
[conda] torchvision 0.15.2 cpu_py311h96b1cb9_0
You’ve installed a CPU-only version:
[conda] pytorch 2.2.0 cpu_py311h150d335_0
on a Tegra device.
So if I uninstall that version it should work, right?
Me also facing the same error. Here are the environment details.
(base) D:\Shailender\Anaconda\model\llama>python -m torch.utils.collect_env
:128: RuntimeWarning: ‘torch.utils.collect_env’ found in sys.modules after import of package ‘torch.utils’, but prior to execution of ‘torch.utils.collect_env’; this may result in unpredictable behaviour
Collecting environment information…
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060
Nvidia driver version: 551.61
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=3600
DeviceID=CPU0
Family=107
L2CacheSize=3072
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=3600
Name=AMD Ryzen 5 3600 6-Core Processor
ProcessorType=3
Revision=28928
Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] numpydoc==1.5.0
[pip3] torch==2.2.1+cu121
[pip3] torchaudio==2.2.1+cu121
[pip3] torchvision==0.17.1+cu121
[conda] _anaconda_depends 2023.09 py311_mkl_1
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6b88ed4_46357
[conda] mkl-service 2.4.0 py311h2bbff1b_1
[conda] mkl_fft 1.3.8 py311h2bbff1b_0
[conda] mkl_random 1.2.4 py311h59b6b97_0
[conda] numpy 1.24.3 py311hdab7c0b_1
[conda] numpy-base 1.24.3 py311hd01c5d8_1
[conda] numpydoc 1.5.0 py311haa95532_0
[conda] torch 2.2.1+cu121 pypi_0 pypi
[conda] torchaudio 2.2.1+cu121 pypi_0 pypi
[conda] torchvision 0.17.1+cu121 pypi_0 pypi
I am getting the following error