CPU only install still defaults to GPU no matter what

Question, is NCCL built-in required for CPU only torch runs?

Here is the command I used to install:

conda install pytorch torchvision torchaudio cpuonly -c pytorch

$ torchrun --nproc_per_node 1 example_completion.py --ckpt_dir CodeLlama-34b/ --tokenizer_path CodeLlama-34b/tokenizer.model --max_seq_len 128 --max_batch_size 4

[W socket.cpp:426] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
Traceback (most recent call last):
  File "/home/jpop/llama/codellama/example_completion.py", line 55, in <module>
    fire.Fire(main)
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/llama/codellama/example_completion.py", line 20, in main
    generator = Llama.build(
                ^^^^^^^^^^^^
  File "/home/jpop/llama/codellama/llama/generation.py", line 68, in build
    torch.distributed.init_process_group("nccl")
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
    default_pg = _new_process_group_helper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
**RuntimeError: Distributed package doesn't have NCCL built in**
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 99331) of binary: /home/jpop/.conda/envs/codellama/bin/python
Traceback (most recent call last):
  File "/home/jpop/.conda/envs/codellama/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/.conda/envs/codellama/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------

Where is this model coming from? It should be indicated in the docs. You do not need ncc;l but it’s being initialised by the code likely.

It is Facebook’s / Meta’s github for CodeLlama I’m using their 34b model. Hmm, If it is in the code from Facebook, well maybe I can figure out how to use my Nvidia, it is an older gaming GPU so I’m not holding my breath. I wonder if they have a CPU repository.

1 Like

I’d try with colab and 7B first What's the machine requirements for each model? · Issue #30 · facebookresearch/codellama · GitHub, and use the GPUs

The 34B parameters is way to heavy and will take minutes to execute in your CPU I assume.

Anyhow, here there is someone with your same issue RuntimeError: Distributed package doesn't have NCCL built in · Issue #70 · facebookresearch/codellama · GitHub

And how they fixed it (for the 7B):

As of now, for 7B parameter model, its working on windows by making changes to generator.py file by using torch.distributed.init_process_group(“gloo”), instead of “nccl”.
Is this methodology fine if I want to use high parameter model in future?

Hmm, well my CPU is one of the first Intel i9 chips. My attempts to use the Nvidia GPU is spitting out this error:

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Let me switch environments and try without GPU but using gloo instead of nccl.

I can not give advice about GPU stuff, it’s normally not obvious what to do to me.

But again, if you search the issues there is one and some pointers: