Apple MACOS M2. Silicon NCCL. Distributed. error - runtime

akramIOT · March 6, 2024, 7:31am

PROBLEM: NCCL. Distributed not. found error.

I do undersytand that Apple M2 Silicon does not. support. CUDA and. NCCL but how to fix this error.?

(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama % python3 test_torch.py
tensor([[0.5552, 0.4753, 0.6758],
[0.3080, 0.7625, 0.7667],
[0.5621, 0.6176, 0.2445],
[0.6803, 0.3974, 0.7331],
[0.3485, 0.3801, 0.9699]])
False
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama %

Test Script:
import torch
x = torch.rand(5,3)
print(x)

print(torch.cuda.is_available())
#print(torch.cuda.nccl.is_available(tensors=x))
#print(torch.cuda.nccl.is_available(torch.randn(1).cuda(

(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama % torchrun --nproc_per_node 1 example_text_completion.py
–ckpt_dir llama-2-7b/
–tokenizer_path tokenizer.model
–max_seq_len 128 --max_batch_size 6
[2024-03-05 23:30:17,309] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:608: UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
warnings.warn(“Attempted to get default timeout for nccl backend, but NCCL support is not compiled”)
Traceback (most recent call last):
File “/Users/akram_personal/AKRAM_CODE_FOLDER/AKRAM_LLM/LLAMA_MODELS/Meta_Proj/llama/example_text_completion.py”, line 69, in
fire.Fire(main)
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/fire/core.py”, line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/fire/core.py”, line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/fire/core.py”, line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File “/Users/akram_personal/AKRAM_CODE_FOLDER/AKRAM_LLM/LLAMA_MODELS/Meta_Proj/llama/example_text_completion.py”, line 32, in main
generator = Llama.build(
^^^^^^^^^^^^
File “/Users/akram_personal/AKRAM_CODE_FOLDER/AKRAM_LLM/LLAMA_MODELS/Meta_Proj/llama/llama/generation.py”, line 85, in build
torch.distributed.init_process_group(“nccl”)
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/c10d_logger.py”, line 86, in wrapper
func_return = func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py”, line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py”, line 1302, in _new_process_group_helper
raise RuntimeError(“Distributed package doesn’t have NCCL built in”)
RuntimeError: Distributed package doesn’t have NCCL built in
[2024-03-05 23:30:22,330] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 48795) of binary: /opt/anaconda3/envs/New_Torch/bin/python
Traceback (most recent call last):
File “/opt/anaconda3/envs/New_Torch/bin/torchrun”, line 33, in
sys.exit(load_entry_point(‘torch==2.2.1’, ‘console_scripts’, ‘torchrun’)())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py”, line 347, in wrapper
return f(*args, kwargs)
^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/run.py”, line 812, in main
run(args)
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/run.py”, line 803, in run
elastic_launch(
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/launcher/api.py”, line 135, in call**
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/launcher/api.py”, line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_text_completion.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-03-05_23:30:22
host : 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 48795)
error_file: <N/A>
traceback : To enable traceback see: Error Propagation — PyTorch 2.2 documentation

(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama %

Reference: I am still getting the. NCCL Error on MACOS. M2 Chip even with the. workaround mentioned for. MPS as per this. Repo, any thoughts ? · aggiee/llama-v2-mps · Discussion #2 · GitHub

ptrblck · March 6, 2024, 2:15pm

Don’t use any CUDA or NCCL calls on your setup which does not support them by removing the corresponding PyTorch operations. Alternatively, run your code on a Linux platform with a GPU and it should work.

akramIOT · March 6, 2024, 5:11pm

@ptrblck : how do i ensure that no CUDA and NCCL calls are there as this is Basic Vanilla code i have taken for MACOS as per recommendation. Is there any command output i can check and validate ?

akramIOT · March 7, 2024, 5:37am

@ptrblck : (New_Torch) akram_personal@AKRAMs-MacBook-Pro llama %
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama % torchrun --nproc_per_node 1 example_text_completion.py
–ckpt_dir llama-2-7b/
–tokenizer_path tokenizer.model
–max_seq_len 128 --max_batch_size 6
[2024-03-06 21:35:16,499] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:608: UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
warnings.warn(“Attempted to get default timeout for nccl backend, but NCCL support is not compiled”)
Traceback (most recent call last):
File “/Users/akram_personal/AKRAM_CODE_FOLDER/AKRAM_LLM/LLAMA_MODELS/Meta_Proj/llama/example_text_completion.py”, line 69, in
fire.Fire(main)
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/fire/core.py”, line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/fire/core.py”, line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/fire/core.py”, line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File “/Users/akram_personal/AKRAM_CODE_FOLDER/AKRAM_LLM/LLAMA_MODELS/Meta_Proj/llama/example_text_completion.py”, line 32, in main
generator = Llama.build(
^^^^^^^^^^^^
File “/Users/akram_personal/AKRAM_CODE_FOLDER/AKRAM_LLM/LLAMA_MODELS/Meta_Proj/llama/llama/generation.py”, line 85, in build
torch.distributed.init_process_group(“nccl”)
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/c10d_logger.py”, line 86, in wrapper
func_return = func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py”, line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py”, line 1302, in _new_process_group_helper
raise RuntimeError(“Distributed package doesn’t have NCCL built in”)
RuntimeError: Distributed package doesn’t have NCCL built in
[2024-03-06 21:35:21,518] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 56823) of binary: /opt/anaconda3/envs/New_Torch/bin/python
Traceback (most recent call last):
File “/opt/anaconda3/envs/New_Torch/bin/torchrun”, line 33, in
sys.exit(load_entry_point(‘torch==2.2.1’, ‘console_scripts’, ‘torchrun’)())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py”, line 347, in wrapper
return f(*args, kwargs)
^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/run.py”, line 812, in main
run(args)
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/run.py”, line 803, in run
elastic_launch(
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/launcher/api.py”, line 135, in call**
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/distributed/launcher/api.py”, line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_text_completion.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-03-06_21:35:21
host : 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 56823)
error_file: <N/A>
traceback : To enable traceback see: Error Propagation — PyTorch 2.2 documentation

(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama %
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama % pip list
Package Version Editable project location

absl-py 1.4.0
accelerate 0.27.2
aiohttp 3.9.3
aiosignal 1.2.0
anyio 3.5.0
appdirs 1.4.4
appnope 0.1.2
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
asgiref 3.4.1
asttokens 2.0.5
astunparse 1.6.3
async-lru 2.0.4
attrs 21.2.0
azure-core 1.14.0
azure-storage-blob 12.9.0
Babel 2.11.0
backcall 0.2.0
beautifulsoup4 4.12.2
bitsandbytes 0.42.0
black 24.2.0
bleach 4.1.0
blinker 1.6.2
Bottleneck 1.3.7
Brotli 1.1.0
cachetools 4.2.2
certifi 2024.2.2
cffi 1.16.0
chardet 4.0.0
charset-normalizer 2.0.4
click 8.1.7
cloudpickle 1.1.1
colorama 0.4.4
coloredlogs 15.0.1
comm 0.2.1
contourpy 1.2.0
cryptography 41.0.3
cycler 0.11.0
datasets 2.18.0
debugpy 1.6.7
decorator 5.1.0
defusedxml 0.7.1
Deprecated 1.2.13
dill 0.3.8
entrypoints 0.4
etils 1.7.0
evaluate 0.4.1
executing 0.8.3
fairscale 0.4.13
fastjsonschema 2.16.2
filelock 3.13.1
fire 0.5.0
Flask 2.0.2
flatbuffers 23.5.26
fonttools 4.25.0
frozenlist 1.4.0
fsspec 2024.2.0
future 0.18.2
gast 0.4.0
gmpy2 2.1.2
google-auth 2.22.0
google-auth-oauthlib 0.5.2
google-pasta 0.2.0
grpcio 1.48.2
gunicorn 20.1.0
h11 0.12.0
h5py 3.9.0
huggingface-hub 0.21.3
humanfriendly 10.0
idna 3.3
importlib_resources 6.1.2
inflate64 1.0.0
ipykernel 6.28.0
ipython 8.20.0
ipython-genutils 0.2.0
ipywidgets 7.6.5
isodate 0.6.0
isort 5.9.3
itsdangerous 2.0.1
jax 0.3.15
jedi 0.18.1
Jinja2 3.0.2
jmespath 0.10.0
joblib 1.1.0
json5 0.9.6
jsonschema 4.17.3
jupyter 1.0.0
jupyter-client 7.1.2
jupyter-console 6.6.3
jupyter_core 5.5.0
jupyter-events 0.6.3
jupyter-lsp 2.2.0
jupyter-server 1.13.5
jupyter_server_terminals 0.4.4
jupyterlab 2.3.1
jupyterlab-pygments 0.1.2
jupyterlab-server 1.2.0
jupyterlab-widgets 1.0.0
keras 2.15.0
Keras-Preprocessing 1.1.2
kiwisolver 1.4.4
lazy-object-proxy 1.6.0
libclang 16.0.6
llama 0.0.1 /Users/akram_personal/AKRAM_CODE_FOLDER/AKRAM_LLM/LLAMA_MODELS/Meta_Proj/llama
llama-recipes 0.0.1
loralib 0.1.2
lxml 4.9.3
Markdown 3.4.1
MarkupSafe 2.1.3
matplotlib 3.8.0
matplotlib-inline 0.1.6
mccabe 0.6.1
mistune 0.8.4
ml-dtypes 0.2.0
mmh3 3.0.0
mpmath 1.3.0
msrest 0.6.21
multidict 6.0.4
multiprocess 0.70.16
multivolumefile 0.2.3
munkres 1.1.4
mypy-extensions 1.0.0
nbclassic 1.0.0
nbclient 0.8.0
nbconvert 6.5.4
nbformat 5.1.3
nest-asyncio 1.5.1
networkx 3.1
notebook 6.5.4
notebook_shim 0.2.3
numexpr 2.8.7
numpy 1.23.5
oauthlib 3.2.2
opt-einsum 3.3.0
optimum 1.17.1
overrides 7.4.0
packaging 23.2
pandas 2.0.3
pandocfilters 1.5.0
parso 0.8.2
pathspec 0.12.1
peft 0.9.0
pexpect 4.8.0
pickleshare 0.7.5
pillow 10.2.0
pip 23.3.1
platformdirs 3.10.0
ply 3.11
prometheus-client 0.14.1
prompt-toolkit 3.0.43
protobuf 3.20.3
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
py7zr 0.21.0
pyarrow 15.0.0
pyarrow-hotfix 0.6
pyasn1 0.4.8
pyasn1-modules 0.2.8
pybcj 1.0.2
pycparser 2.20
pycryptodomex 3.20.0
pydantic 1.8.2
Pygments 2.10.0
PyJWT 2.4.0
pyOpenSSL 23.2.0
pyparsing 3.0.9
pyppmd 1.1.0
PyQt5 5.15.10
PyQt5-sip 12.13.0
pyrsistent 0.18.0
PySocks 1.7.1
python-dateutil 2.8.2
python-editor 1.0.4
python-json-logger 2.0.7
pytz 2021.3
PyYAML 6.0
pyzmq 24.0.1
pyzstd 0.15.9
qtconsole 5.5.1
QtPy 2.4.1
regex 2023.12.25
requests 2.26.0
requests-oauthlib 1.3.0
requests-toolbelt 0.9.1
responses 0.18.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rpds-py 0.10.6
rsa 4.7.2
safetensors 0.4.2
scipy 1.11.4
seaborn 0.11.2
Send2Trash 1.8.0
sentencepiece 0.2.0
setuptools 68.2.2
sip 6.7.12
six 1.16.0
sniffio 1.2.0
soupsieve 2.5
stack-data 0.2.0
sympy 1.12
tenacity 8.0.1
tensorboard 2.15.2
tensorboard-data-server 0.7.0
tensorboard-plugin-wit 1.8.1
tensorflow 2.12.0
tensorflow-estimator 2.15.0
tensorflow-io-gcs-filesystem 0.36.0
tensorflow-macos 2.15.0
tensorflow-metal 1.1.0
termcolor 2.1.0
terminado 0.17.1
testpath 0.5.0
texttable 1.7.0
tinycss2 1.2.1
tokenize-rt 5.2.0
tokenizers 0.15.2
toml 0.10.2
torch 2.2.1
torchaudio 2.2.1
tornado 6.3.3
tqdm 4.62.3
traitlets 5.7.1
transformers 4.38.2
typing_extensions 4.10.0
tzdata 2023.3
urllib3 1.26.18
uvloop 0.16.0
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 0.58.0
Werkzeug 2.0.2
wheel 0.35.1
widgetsnbextension 3.5.2
wrapt 1.14.1
xxhash 3.4.1
yarl 1.9.3
zipp 3.6.0
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama %
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama % pip list | grep nccl
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama %
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama % pip list | grep cpu
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama %
(New_Torch) akram_personal@AKRAMs-MacBook-Pro llama %

akramIOT · March 8, 2024, 3:42am

@ptrblck : Infact i have also validated that. Torch is not compiled with NCCL support. appreciate all your thoughts to. resolve this on M2 MACOS. Chip.

/opt/anaconda3/envs/New_Torch/lib/python3.11/site-packages/torch/cuda/nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn(“PyTorch is not compiled with NCCL support”)
False

ptrblck · March 8, 2024, 3:58am

There is no solution to compile NCCL on MacOS, as it does not support NVIDIA GPUs. The right approach is to check the code you are running and either disable all NCCL calls (or replace these with another library supported on Mac) or to use a Linux workstation with NVIDIA GPUs as already mentioned.

Apple MACOS M2. Silicon NCCL. Distributed. error - runtime

example_text_completion.py FAILED

Failures: <NO_OTHER_FAILURES>

example_text_completion.py FAILED

Failures: <NO_OTHER_FAILURES>

Failures:
<NO_OTHER_FAILURES>

Failures:
<NO_OTHER_FAILURES>