Lib conflicts with ray and spacy

I’m deploying my ML model on Anyscale which uses ray, transformers and spacy.

I’m getting this error while trying to run my process:

Traceback (most recent call last):
  File "/tmp/ray/session_2024-01-04_07-06-56_749724_4614/runtime_resources/working_dir_files/_ray_pkg_9b970ef0493576c1/processor_kwextraction.py", line 229, in <module>
    model = en_core_web_trf.load()
  File "/home/ray/anaconda3/lib/python3.9/site-packages/en_core_web_trf/__init__.py", line 10, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy/util.py", line 649, in load_model_from_init_py
    return load_model_from_path(
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy/util.py", line 506, in load_model_from_path
    nlp = load_model_from_config(
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy/util.py", line 554, in load_model_from_config
    nlp = lang_cls.from_config(
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy/language.py", line 1788, in from_config
    nlp = lang_cls(vocab=vocab, create_tokenizer=create_tokenizer, meta=meta)
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy/language.py", line 163, in __init__
    util.registry._entry_point_factories.get_all()
  File "/home/ray/anaconda3/lib/python3.9/site-packages/catalogue/__init__.py", line 110, in get_all
    result.update(self.get_entry_points())
  File "/home/ray/anaconda3/lib/python3.9/site-packages/catalogue/__init__.py", line 125, in get_entry_points
    result[entry_point.name] = entry_point.load()
  File "/home/ray/anaconda3/lib/python3.9/importlib/metadata.py", line 86, in load
    module = import_module(match.group('module'))
  File "/home/ray/anaconda3/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy_transformers/__init__.py", line 1, in <module>
    from . import architectures
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy_transformers/architectures.py", line 6, in <module>
    from .layers import TransformerModel, TransformerListener
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy_transformers/layers/__init__.py", line 1, in <module>
    from .listener import TransformerListener
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy_transformers/layers/listener.py", line 4, in <module>
    from ..data_classes import TransformerData
  File "/home/ray/anaconda3/lib/python3.9/site-packages/spacy_transformers/data_classes.py", line 3, in <module>
    import torch
  File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/__init__.py", line 235, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: ncclRedOpDestroy

Version of ray: anyscale/ray:2.1.0-py39-cu112

These are the dependencies:

pandas
pymongo==4.1.1
pymongo[srv]
spacy[cuda112]==3.4.2
kafka-python
elasticsearch==7.9.1
htmldate
urllib3<1.27

These are the post build dependencies:

python -m pip install “pymongo[srv]”
pip install transformers
pip install -U sentence-transformers
python -m spacy download en_core_web_trf
python -m spacy download en_core_web_md

The current PyTorch binaries ship with NCCL>=2.18 and ncclRedOpDestroy was introduced in NCCL==2.11.4 approx. 3 years ago.
I don’t know which dependencies ray uses, but you might want to double check NCCL wasn’t downgraded to an ancient version.

1 Like

So from the logs, I see that I have NCCL version 2.18.1

nvidia-nccl-cu12==2.18.1

I have resolved this issue. To anyone else who encounters this issue in future, the problem was the incompatibility between ray and cuda versions. Problem got resolved using CUDA 116 with Ray 2.5.0.