Missing Linux x86-64 Large wheels (cuda bundled) -- can not install alongside TensorFlow

shahrokhi · February 24, 2024, 9:31pm

Hi,

Is it possible to get the large wheels for pytorch > 2.2 and later? They seem to be replaced by small wheel from here: Why are we keep building large wheels · Issue #113972 · pytorch/pytorch · GitHub.
In the small wheels, versions of cuda libraries from pypi are hardcoded, which makes it difficult to install anlongside Tensorflow in the same container/environment. Tensorflow also hardcodes these libraries (cudnn, etc.), but to a different minor version.
I also appreciate any other way to make this setup work.

Thanks

ptrblck · February 24, 2024, 11:17pm

Do you see any issues installing PyTorch with other libraries? As long as only minor versions are updated you should be fine using the latest ones.

shahrokhi · February 25, 2024, 1:45am

I tried with pip and PDM.

~/.env> pip list
Package    Version
---------- -------
pip        24.0
setuptools 69.1.1
wheel      0.42.0

with PDM:

.env> pdm add "tensorflow[and-cuda]==2.15.0.post1" "torch==2.2.1"
Adding packages to default dependencies: tensorflow==2.15.0.post1, torch==2.2.1
🔒 Lock failed
WARNING: Unable to find a resolution for nvidia-cuda-nvrtc-cu12
because of the following conflicts:
  nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" (from torch@2.2.1)
  nvidia-cuda-nvrtc-cu12==12.2.140 (from tensorflow@2.15.0.post1)
To fix this, you could loosen the dependency version constraints in pyproject.toml. See
https://pdm-project.org/latest/usage/dependency/#solve-the-locking-failure for more details.
See /tmp/pdm-lock-woszxaob.log for detailed debug log.
[ResolutionImpossible]: Unable to find a resolution

with pip:

.env> pip install "tensorflow[and-cuda]==2.15.0.post1" "torch==2.2.1" --dry-run
Collecting tensorflow==2.15.0.post1 (from tensorflow[and-cuda]==2.15.0.post1)
  Downloading tensorflow-2.15.0.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting torch==2.2.1
  Downloading torch-2.2.1-cp311-cp311-manylinux1_x86_64.whl.metadata (26 kB)
Collecting absl-py>=1.0.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting flatbuffers>=23.5.26 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading flatbuffers-23.5.26-py2.py3-none-any.whl.metadata (850 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading gast-0.5.4-py3-none-any.whl.metadata (1.3 kB)
Collecting google-pasta>=0.1.1 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.5/57.5 kB 795.6 kB/s eta 0:00:00
Collecting h5py>=2.9.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading h5py-3.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.5 kB)
Collecting libclang>=13.0.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading libclang-16.0.6-py2.py3-none-manylinux2010_x86_64.whl.metadata (5.2 kB)
Collecting ml-dtypes~=0.2.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading ml_dtypes-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting numpy<2.0.0,>=1.23.5 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 781.2 kB/s eta 0:00:00
Collecting opt-einsum>=2.3.2 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.5/65.5 kB 831.4 kB/s eta 0:00:00
Collecting packaging (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading protobuf-4.25.3-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Requirement already satisfied: setuptools in /home/leap/micromamba/envs/test/lib/python3.11/site-packages (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1) (69.1.1)
Collecting six>=1.12.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting termcolor>=1.1.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading termcolor-2.4.0-py3-none-any.whl.metadata (6.1 kB)
Collecting typing-extensions>=3.6.6 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting wrapt<1.15,>=1.11.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading wrapt-1.14.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading tensorflow_io_gcs_filesystem-0.36.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (14 kB)
Collecting grpcio<2.0,>=1.24.3 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading grpcio-1.62.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Collecting tensorboard<2.16,>=2.15 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading tensorboard-2.15.2-py3-none-any.whl.metadata (1.7 kB)
Collecting tensorflow-estimator<2.16,>=2.15.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading tensorflow_estimator-2.15.0-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting keras<2.16,>=2.15.0 (from tensorflow==2.15.0.post1->tensorflow[and-cuda]==2.15.0.post1)
  Downloading keras-2.15.0-py3-none-any.whl.metadata (2.4 kB)
Collecting filelock (from torch==2.2.1)
  Downloading filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Collecting sympy (from torch==2.2.1)
  Downloading sympy-1.12-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch==2.2.1)
  Downloading networkx-3.2.1-py3-none-any.whl.metadata (5.2 kB)
Collecting jinja2 (from torch==2.2.1)
  Downloading Jinja2-3.1.3-py3-none-any.whl.metadata (3.3 kB)
Collecting fsspec (from torch==2.2.1)
  Downloading fsspec-2024.2.0-py3-none-any.whl.metadata (6.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.1)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.1)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.1)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.1)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.1)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.1)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.2.1)
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.2.1)
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.2.1)
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12==2.19.3 (from torch==2.2.1)
  Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.2.1)
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.7 kB)
Collecting triton==2.2.0 (from torch==2.2.1)
  Downloading triton-2.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
INFO: pip is looking at multiple versions of tensorflow[and-cuda] to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install tensorflow[and-cuda]==2.15.0.post1 and torch==2.2.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    torch 2.2.1 depends on nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64"
    tensorflow[and-cuda] 2.15.0.post1 depends on nvidia-cublas-cu12==12.2.5.6; extra == "and-cuda"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Also, using pythorch wheels at download.pytorch.org gives the same error (it’s the same wheel as pypi):

.env> pdm add "tensorflow[and-cuda]==2.15.0.post1" "torch @ https://download.pytorch.org/whl/cu121/torch-2.2.1%2Bcu121-cp311-cp311-linux_x86_64.whl"
Adding packages to default dependencies: tensorflow==2.15.0.post1, torch @
https://download.pytorch.org/whl/cu121/torch-2.2.1%2Bcu121-cp311-cp311-linux_x86_64.whl
🔒 Lock failed
WARNING: Unable to find a resolution for nvidia-cublas-cu12
because of the following conflicts:
  nvidia-cublas-cu12==12.1.3.1; platform_system == "Linux" and platform_machine == "x86_64" (from
torch@https://download.pytorch.org/whl/cu121/torch-2.2.1%2Bcu121-cp311-cp311-linux_x86_64.whl)
  nvidia-cublas-cu12==12.2.5.6 (from tensorflow@2.15.0.post1)
To fix this, you could loosen the dependency version constraints in pyproject.toml. See
https://pdm-project.org/latest/usage/dependency/#solve-the-locking-failure for more details.
See /tmp/pdm-lock-ff18en7s.log for detailed debug log.
[ResolutionImpossible]: Unable to find a resolution

If I don’t pin tensorflow or pytorch, lower verion of tensorflow (to 2.14.1) or pytorch gets installed. Given that they both like to pin their cuda dependencies to minor versions, I think I will never be able to make an updated and consistent environment in the future. With large linux wheels, I was able to get cuda packages for tensorflow from pypi, and pytorch did not bother because it used its own bundled libraries.

ptrblck · February 25, 2024, 3:53pm

You could still install one of the packages without its requirements and would have the same experience. I.e. previously the first imported package (in this case TF2 or PyTorch) would load its dependencies and the second one would use it. If you now install one package with --no-deps you would use the CUDA dependencies from the other one, assuming you want to mix both. If not, different virtual environment might be needed.

shahrokhi · February 25, 2024, 5:31pm

Thanks. I could, but I would need to go through their dependency lists for each version, because they don’t completely overlap. Also, the environment would not be easily upgradable.