PyTorch 2.0 distribution that uses cuda only if available?

Hey folks,

after upgrading to torch==2.0 yesterday, I have found that I am no longer able to run torch programs if the system doesn’t have CUDA.

Here’s my observation on the various distributions:

# PyTorch 2.0 for use with CUDA, CUDA libs get installed as pip deps if unavailable on the system. Wheel size w/o deps: ~620MB
pip3 install torch==2.0
# PyTorch 2.0 with bundled CUDA 11.7. Wheel size w/o deps: 1.8GB
pip3 install torch==2.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
# PyTorch 2.0 w/o CUDA support. Wheel size w/o deps: 195MB
pip3 install torch==2.0+cpu --extra-index-url https://download.pytorch.org/whl/cpu

If I now look at what got installed from the first option (site-packages/torch/lib), I see, among other things:

-rwxrwxr-x 1 ubuntu ubuntu 487M Apr 11 08:33 libtorch_cpu.so
-rwxrwxr-x 1 ubuntu ubuntu 627M Apr 11 08:33 libtorch_cuda.so

so my expectation would be that this distribution allows me to use Torch with or without CUDA support.

However, in reality, import torch fails on a non-CUDA system (w/o the CUDA pip deps installed), because ldd libtorch_global_deps.so shows that the global deps library – which is unconditionally loaded at package import time – is linked against a bunch of CUDA libraries (libcublas.so, libcurand.so and others), which then fails to load on a non-CUDA system.

This is apparently different from the behavior in torch==1.11.0 (previous version I was using). Here, I also see

-rwxrwxr-x 1 ubuntu ubuntu 433M Apr 11 09:51 libtorch_cpu.so
-rwxrwxr-x 1 ubuntu ubuntu 994M Apr 11 09:50 libtorch_cuda.so

in the lib folder of the package, and I can indeed use CUDA on a CUDA-system, but the libtorch_global_deps.so does not link against any CUDA libraries:

$ ldd venv/lib/python3.9/site-packages/torch/lib/libtorch_global_deps.so | grep cuda
$

Does anyone have any insight into why this change was made? It makes it much much harder to use a consistent set of dependencies on various systems/architectures.

Now, this doesn’t matter if installing the right version of torch is regarded as a responsibility of the system, but in our case, we use bazel as our toolchain and thus require some level of hermetic homogeneity between building our production containers (using a base image with CUDA system libs) and, e.g., running basic functional tests in CI (on runners that don’t have GPU and where installing CUDA is a waste of time and space). Concretely, we don’t want to install over a GB of CUDA libraries as a Python dependency, because shipping them in a base image layer is more efficient. And we don’t want to install CUDA in the image used for our CI runner, because it will never have GPUs. But we do want to be able to run bazel test on a GPU-enabled, CUDA-enabled Linux system, have Torch use CUDA, without bazel resolving dependencies differently from CI under the hood.

I guess we could somehow address this at the Bazel level if there’s no other way, but why do we now have to, when in torch 1.11 we didn’t, while still enjoying CUDA support?

I don’t fully understand this claim. Are you installing the default PyTorch pip wheels with CUDA 11.7 dependencies (installed via CUDA pip wheels) hosted on PyPI and are deleting them afterwards manually?
Afterwards you are trying to import torch and are wondering why the import fails?

Without deleting libraries manually the CUDA-enabled wheels will still run:

CUDA_VISIBLE_DEVICES="" python -c "import torch; print(torch.cuda.is_available()); print(torch.__version__)"
False
2.0.0+cu117

However, if you want to install CPU-only wheels you could select them from the install matrix.

This is caused by a change in our build process since previously we were statically linking CUDA math libs into libtorch* so it was not directly visible as dependent files.
Besides that the same user experience was exposed: you can install the CUDA-enabled wheels on a CPU-only system and they will work (the default PyPI wheels). If you want to install the CPU-only wheels you could select them from the install matrix.

Could you describe what exactly is failing if you don’t manually delete dependencies?

Thanks for your response, you’re absolutely right. What I saw was actually a bug in the tool (based on pex) we use to create a Bazel lockfile out of a requirements.txt. It operates under the assumption that wheels for different platforms have all the same requirements, just annotated with environment markers, but that’s not the case for torch (as the requirements listed in the torch linux wheel metadata have linux/x86_64 environment markers, it wouldn’t hurt to have them in the darwin wheel as well, but they’re not there). However, I think the assumption of the tool is clearly too strict.

So the tool accidentally “deleted” the cuda deps, because they don’t occur in the darwin wheel. And this accidentally was the “right” thing for our scenario, as we ship our code in an image that has all required CUDA deps installed system-wide through the base image. But that right thing is clearly a hack given what the PyTorch team intended in terms of distribution flavors.

So I guess with one mystery solved, my questions can basically be condensed to the following:

  1. Is there a pre-compiled torch 2.0.0 package that neither bundles CUDA nor pulls in CUDA as Python dependencies, but instead uses CUDA libraries installed on the system?
  2. Same as 1, but with additionally gracefully falling back to CPU-only mode if the required CUDA libs are not present on the system?
  3. If the answer to either of the above is no, because this is a too uncommon setup, is there an option of creating such a package by compiling from (but without modifying the) source?

Besides that the same user experience was exposed: you can install the CUDA-enabled wheels on a CPU-only system and they will work (the default PyPI wheels). If you want to install the CPU-only wheels you could select them from the install matrix.

It still seems that there is a gap between the two options (CUDA-enabled wheels with CUDA libs as pip dependencies vs. CPU-only wheels): while both run on a GPU-less system, the former needs an extra GB or so of dependencies installed. Not just as a formal requirement, but will indeed not run without these deps, even though on those systems the same functionality is offered as by the CPU-only package that doesn’t need this dependency.

This is caused by a change in our build process since previously we were statically linking CUDA math libs into libtorch* so it was not directly visible as dependent files.

Hmm, I see, thanks for the explanation.