Hi
I am trying to run llama LLN in Windows, using my GPU and CUDA.
I have followed the instructions for installing a pytorch environment in conda using all the combinations of CUDA 11.8, 1.7, Python 3.9, 3.10, 3.11
I keep getting this error when running this
It seems like this
torch.distributed.init_process_group("nccl")
Is asking for NCCL - but I dont have that installed, and on Conda its a linux only package anyway, and I’m using windows.
Also why is the pytorch package trying to connect to kubernetes ?
Its the torchrun-script in the conda environment folder that fails with
RuntimeError: Distributed package doesn’t have NCCL built in
`python -m torchrun-script --nproc_per_node 1 example_text_completion.py --ckpt_dir …\llama-2-7b --tokenizer_path …\llama-2-7b\tokenizer.model --max_seq_len 128 --max_batch_size 4
NOTE: Redirects are currently not supported in Windows or MacOs.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - The requested address is not valid in its context.).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - The requested address is not valid in its context.).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - The requested address is not valid in its context.).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File “H:\llama2\repo\llama\example_text_completion.py”, line 55, in
fire.Fire(main)
File “U:\Miniconda3\envs\llama2env\lib\site-packages\fire\core.py”, line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File “U:\Miniconda3\envs\llama2env\lib\site-packages\fire\core.py”, line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File “U:\Miniconda3\envs\llama2env\lib\site-packages\fire\core.py”, line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File “H:\llama2\repo\llama\example_text_completion.py”, line 18, in main
generator = Llama.build(
File “H:\llama2\repo\llama\llama\generation.py”, line 61, in build
torch.distributed.init_process_group(“nccl”)
File “U:\Miniconda3\envs\llama2env\lib\site-packages\torch\distributed\distributed_c10d.py”, line 907, in init_process_group
default_pg = _new_process_group_helper(
File “U:\Miniconda3\envs\llama2env\lib\site-packages\torch\distributed\distributed_c10d.py”, line 1013, in _new_process_group_helper
raise RuntimeError("Distributed package doesn’t have NCCL " “built in”)
RuntimeError: Distributed package doesn’t have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 20656) of binary: U:\Miniconda3\envs\llama2env\python.exe
Traceback (most recent call last):
File “U:\Miniconda3\envs\llama2env\lib\runpy.py”, line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File “U:\Miniconda3\envs\llama2env\lib\runpy.py”, line 87, in run_code
exec(code, run_globals)
File “U:\Miniconda3\envs\llama2env\Scripts\torchrun-script.py”, line 33, in
sys.exit(load_entry_point(‘torch==2.0.1’, ‘console_scripts’, ‘torchrun’)())
File "U:\Miniconda3\envs\llama2env\lib\site-packages\torch\distributed\elastic\multiprocessing\errors_init.py", line 346, in wrapper
return f(*args, **kwargs)
File “U:\Miniconda3\envs\llama2env\lib\site-packages\torch\distributed\run.py”, line 794, in main
run(args)
File “U:\Miniconda3\envs\llama2env\lib\site-packages\torch\distributed\run.py”, line 785, in run
elastic_launch(
File “U:\Miniconda3\envs\llama2env\lib\site-packages\torch\distributed\launcher\api.py”, line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File “U:\Miniconda3\envs\llama2env\lib\site-packages\torch\distributed\launcher\api.py”, line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
example_text_completion.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2023-08-21_01:17:09
host : Lightning-III
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 20656)
error_file: <N/A>
traceback : To enable traceback see: Error Propagation — PyTorch 2.0 documentation
============================================================`
Everything else seems to work - ie conda, cuda, python, torch - it justs seems to be torchrun
>>> import torch
>>> torch.cuda.is_available()
True
import torch
x = torch.rand(5, 3)
print(x)
tensor([[0.5495, 0.0281, 0.2566],
[0.7032, 0.1296, 0.8173],
[0.0329, 0.5500, 0.3025],
[0.6790, 0.0561, 0.3389],
[0.4403, 0.5365, 0.5513]])
And my conda packages
(llama2env) H:\llama2\repo>conda list
# packages in environment at U:\Miniconda3\envs\llama2env:
#
# Name Version Build Channel
blas 1.0 mkl
brotlipy 0.7.0 py39h2bbff1b_1003
ca-certificates 2023.05.30 haa95532_0
certifi 2023.7.22 py39haa95532_0
cffi 1.15.1 py39h2bbff1b_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
cryptography 41.0.2 py39hac1b9e3_0
cuda-cccl 12.2.128 0 nvidia
cuda-cudart 11.7.99 0 nvidia
cuda-cudart-dev 11.7.99 0 nvidia
cuda-cupti 11.7.101 0 nvidia
cuda-libraries 11.7.1 0 nvidia
cuda-libraries-dev 11.7.1 0 nvidia
cuda-nvrtc 11.7.99 0 nvidia
cuda-nvrtc-dev 11.7.99 0 nvidia
cuda-nvtx 11.7.91 0 nvidia
cuda-runtime 11.7.1 0 nvidia
fairscale 0.4.13 pypi_0 pypi
filelock 3.9.0 py39haa95532_0
fire 0.5.0 pypi_0 pypi
freetype 2.12.1 ha860e81_0
giflib 5.2.1 h8cc25b3_3
idna 3.4 py39haa95532_0
intel-openmp 2023.1.0 h59b6b97_46319
jinja2 3.1.2 py39haa95532_0
jpeg 9e h2bbff1b_1
lerc 3.0 hd77b12b_0
libcublas 11.10.3.66 0 nvidia
libcublas-dev 11.10.3.66 0 nvidia
libcufft 10.7.2.124 0 nvidia
libcufft-dev 10.7.2.124 0 nvidia
libcurand 10.3.3.129 0 nvidia
libcurand-dev 10.3.3.129 0 nvidia
libcusolver 11.4.0.1 0 nvidia
libcusolver-dev 11.4.0.1 0 nvidia
libcusparse 11.7.4.91 0 nvidia
libcusparse-dev 11.7.4.91 0 nvidia
libdeflate 1.17 h2bbff1b_0
libnpp 11.7.4.75 0 nvidia
libnpp-dev 11.7.4.75 0 nvidia
libnvjpeg 11.8.0.2 0 nvidia
libnvjpeg-dev 11.8.0.2 0 nvidia
libpng 1.6.39 h8cc25b3_0
libtiff 4.5.0 h6c2663c_2
libuv 1.44.2 h2bbff1b_0
libwebp 1.2.4 hbc33d0d_1
libwebp-base 1.2.4 h2bbff1b_1
llama 0.0.1 dev_0 <develop>
lz4-c 1.9.4 h2bbff1b_0
markupsafe 2.1.1 py39h2bbff1b_0
mkl 2023.1.0 h6b88ed4_46357
mkl-service 2.4.0 py39h2bbff1b_1
mkl_fft 1.3.6 py39hf11a4ad_1
mkl_random 1.2.2 py39hf11a4ad_1
mpmath 1.3.0 py39haa95532_0
networkx 3.1 py39haa95532_0
numpy 1.25.2 py39h055cbcc_0
numpy-base 1.25.2 py39h65a83cf_0
openssl 3.0.10 h2bbff1b_0
pillow 9.4.0 py39hd77b12b_0
pip 23.2.1 py39haa95532_0
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 23.2.0 py39haa95532_0
pysocks 1.7.1 py39haa95532_0
python 3.9.17 h1aa4202_0
pytorch 2.0.1 py3.9_cuda11.7_cudnn8_0 pytorch
pytorch-cuda 11.7 h16d0643_5 pytorch
pytorch-mutex 1.0 cuda pytorch
requests 2.31.0 py39haa95532_0
sentencepiece 0.1.99 pypi_0 pypi
setuptools 68.0.0 py39haa95532_0
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h2bbff1b_0
sympy 1.11.1 py39haa95532_0
tbb 2021.8.0 h59b6b97_0
termcolor 2.3.0 pypi_0 pypi
tk 8.6.12 h2bbff1b_0
torchaudio 2.0.2 pypi_0 pypi
torchvision 0.15.2 pypi_0 pypi
typing_extensions 4.7.1 py39haa95532_0
tzdata 2023c h04d1e81_0
urllib3 1.26.16 py39haa95532_0
vc 14.2 h21ff451_1
vs2015_runtime 14.27.29016 h5e58377_2
wheel 0.38.4 py39haa95532_0
win_inet_pton 1.1.0 py39haa95532_0
xz 5.4.2 h8cc25b3_0
zlib 1.2.13 h8cc25b3_0
zstd 1.5.5 hd43e919_0
And the environment
python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro
GCC version: (GCC) 4.4.3
Clang version: 11.1.0
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.17 (main, Jul 5 2023, 20:47:11) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 11.7.64
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 536.99
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=4501
DeviceID=CPU0
Family=107
L2CacheSize=16384
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=4501
Name=AMD Ryzen 9 7950X 16-Core Processor
ProcessorType=3
Revision=24834
Versions of relevant libraries:
[pip3] numpy==1.25.2
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6b88ed4_46357
[conda] mkl-service 2.4.0 py39h2bbff1b_1
[conda] mkl_fft 1.3.6 py39hf11a4ad_1
[conda] mkl_random 1.2.2 py39hf11a4ad_1
[conda] numpy 1.25.2 py39h055cbcc_0
[conda] numpy-base 1.25.2 py39h65a83cf_0
[conda] pytorch 2.0.1 py3.9_cuda11.7_cudnn8_0 pytorch
[conda] pytorch-cuda 11.7 h16d0643_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 2.0.2 pypi_0 pypi
[conda] torchvision 0.15.2 pypi_0 pypi