Hi, I’m trying to get torchrun to work on my M1 Pro Mac. I saw the other forum posts on this topic, but development happens rapidly and I didn’t get it to work. So, I downloaded Llama 3, ran pip install -e setup.py. Trying to run torchrun with the following command:
torchrun --nproc_per_node 1 example_chat_completion.py \
--no_cuda=True --ckpt_dir Meta-Llama-3-8B-Instruct/ \
--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
--max_seq_len 512 --max_batch_size 6
Then I get the following error:
Traceback (most recent call last):
File "/Users/vortec/workspace/llm/llama3/example_chat_completion.py", line 84, in <module>
fire.Fire(main)
File "/Users/vortec/workspace/instances/llama3/lib/python3.11/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/vortec/workspace/instances/llama3/lib/python3.11/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/Users/vortec/workspace/instances/llama3/lib/python3.11/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/vortec/workspace/llm/llama3/example_chat_completion.py", line 31, in main
generator = Llama.build(
^^^^^^^^^^^^
File "/Users/vortec/workspace/llm/llama3/llama/generation.py", line 68, in build
torch.distributed.init_process_group("nccl")
File "/Users/vortec/workspace/instances/llama3/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 86, in wrapper
func_return = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/vortec/workspace/instances/llama3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/vortec/workspace/instances/llama3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1302, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
Here’s the output of collect_env:
Collecting environment information...
PyTorch version: 2.2.2
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.2.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.1.0.2.5)
CMake version: Could not collect
Libc version: N/A
Python version: 3.11.4 (main, Jun 20 2023, 17:23:00) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime)
Python platform: macOS-14.2.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M1 Pro
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.2.2
[pip3] torchaudio==2.2.2
[pip3] torchvision==0.17.2
[conda] Could not collect
What can I do to fix it?