How do you deal with the problem that your app/program which you want others be able to install and run without any hassle, may require some arbitrary pytorch version depending on what kind of GPU they have installed?
Normally, it is already hard to deal with the usual dependency hell when a program requires many different python packages which all must be there in compatible versions.
But if my program requires torch, e.g. because sentence_transformers depends on it, this is not enough: if the user installing the package has a different, older GPU, the program will not work.
Figuring out which version of torch to install is very hard, and depending on how the application dependencies are managed, replacing the torch version configured there with one that works is also hard, at least in my experience.
For example, I tried to get a program that uses sentence_transformers/torch to run on a laptop with Quadro P2000 GPU. Nvidia-smi reports CUDA 13.0 and Driver 580.95.05
When I run a test program to check torch, I get the message:
âPlease install PyTorch with a following CUDA configurations: 12.6â
However, I have installed torch using pip install torch --index-url https://download.pytorch.org/whl/cu126 into my active environment.
But torch still reports version 2.9.1+cu128, I have no idea why.
Is there any user friendly approach to this? How do you deal with these issues?
Just telling the users: get a more recent GPU if you want to use this app is not really an option.
I donât install pytorch on my ârawâ machine or in my base conda environment. For each
version of pytorch I want to use, I create a new conda environment, activate it, and install
pytorch into that environment. On my ârawâ machine, in my base conda environment, and
in a newly-created conda environment I wonât have any pytorch (so, in particular, I wonât
have, for example, âversion 2.9.1+cu128â). After installing pytorch in a new environment,
I get precisely the version I installed.
(I havenât tried installing two pytorch versions in the same conda environment, so I donât
know which version would win.)
I canât comment on sentence_transformers or on the specific versions you mention.
However, as a general comment, if you want to use some pytorch code that requires a more
recent pytorch version that requires a more recent gpu, you will either have to tell your users
to get a more recent gpu or go into the pytorch code in question and tweak it to break its
dependency on the more recent pytorch version.
If the pytorch code in question uses some substantive new pytorch feature, this could be a lot
of work. But if it just uses some new convenience function or some simple syntax update, you
might well be able to tweak it to use the old-style function or syntax without too much effort.
(Tweaking a pytorch release to use an older gpu that it doesnât support out of the box is
probably something you donât want to try.)
Thanks, but in the end this sounds a lot like Pytorch is simply not usable for end-user software, i.e. for software to be installed and run by users who may not know anything about Python, let alone Pytorch, compute capabilities, CUDA versions, dependency compatibility issues etc.
Which is really pretty sad.
As a developer and ML researcher, I do not have too much problems to eventually get torch running on some particular hardware, though it may still sometimes be impossible to find a set of compatible library versions with some dependencies (in which case one can take advantage of pipâs inability to do this rightâŚ).
But creating a software distribution that needs Pytorch and should be easily installable for end users seems to be impossible (Note that other forms of software which make use of a wide range of GPUs and CUDA versions do NOT have that problem).
Libraries that support GPU for tensor calculations are not researcher/engineer-only any more and I think making this work more flexibly should be a modern requirement.
You donât need conda, FYI - python3 virtual environments are part of the standard library. Theyâre pretty easy to use. For end-users, have you looked into pipx?
I do agree in general though. Itâs not clear to me why the latest pytorch, cuda compiler and nvidia driver can no longer support my tesla k40c. All of these new âAIâ models are still composed from the same basic vector operations (correct me if Iâm wrong), still optimized by gradient descent (or some derived algorithm), and so on. If anything, libraries and compilers should serve to abstract away hardware details/differences. This seems like a better use of labor than implementing/maintaining lots of features/accessories.
For older GPU architectures I would recommend trying to install our PyTorch binaries built with an older CUDA toolkit. Our current releases support GPUs starting from Maxwell, which was released almost 12 years ago, to Blackwell, which is the current architecture.
The error message explicitly states to install a PyTorch binary built with CUDA 12.6 for Pascal support and I would recommend trying to do so.
This is most likely caused by your local environment if e.g. multiple PyTorch binaries were installed. I would stick to @KFrankâs advice and use separate virtual environments e.g. via conda to install any PyTorch binary.
It mainly comes down to maintainability, performance, and binary size. Testing the latest SW stack on devices which were released over 13 years ago can be non-trivial especially if no public cloud providers support these devices anymore. Besides that, PyTorch devs should also not be blocked by adding guards to latest features if these would break ancient generations (again, the testability question is important here as not everyone can even get access to K40s anymore). Lastly, our binaries already span 8 GPU families (Maxwell - Blackwell).
Thanks, as I have said, I personally got the correct version to install and work eventually.
However, the real point I am trying to make is that with the way how pytorch installation currently works, it is basically impossible to distribute software to end users who just want the software to install correctly for their hardware and are not Python/tech knowledgable enough to go through all those steps of checking compute capabilities, CUDA versions etc and then manually installing the correct pytorch version themselves.
With most other python packages this is somethin I can do: I just tell them to e.g. pip install my softeare and run it.
With software therequires torch this is simply not working in many cases. And that is a problem, I think. The correct installation should happen automatically, not based on manual decisions by experts.
I think the rules and information for making automatic installation of the correct libraries for a specific hardware exist, but this is apparently not a priority to build into pytorch.
Which means that pytorch will remain a tool mainly used by researchers and Python hackers, but not something that can be put into software ready to install by end users on whatever hardware they might have.
You might be interested in our Wheel Variants workflow, which enhances the Python package management by trying to utilize knowledge about the installed driver, GPUs, etc., and could work for your use cases.
I still believe you can do the same for PyTorch in the default use case via pip install torch which installs a PyTorch binary with CUDA support for all compute capabilities between sm_70 to sm_120, spanning all GPUs released between 2017 to now. Older devices need the aforementioned extra steps to install the latest PyTorch build with an older CUDA build; new devices could benefit from the latest stack. Keep in mind that pip itself provides no way to solve this, which is one of the goals of the Wheel Variant effort.
I think you are mistaken here, as I donât believe pip allows you to select any binary based on the compute capability, driver, etc. but please correct me if Iâm wrong.
No I also think that the installation of pre-built packages does not allow to select binaries based on compute capability. There may be a bunch of work-arounds though, either offering a larger bundle of pre-built files and selecting the correct ones during installation or having the ability to fetch necessary specific libraries or drivers after installation, based on the hardware.
The basic issue here is what should define a version of a package or library: I think the main defining properties of a library version should be the functionality, issues fixed, api definition, but NOT the supported OS or hardware. Mixing those two will not just make it infinitely harder to resolve dependencies, but also make the number of âversionsâ multiply.
This is definitely not a new problem, engineers developing e.g. games or graphical software for different OS and a huge range of different hardware/gpu options have had to solve this for decades and there again, the best solution seems to be to distinguish between software version and drivers/hardware support options instead of mixing them all together.