Determine which CUDA is in use in a Conda environment

IcarusWizard · February 21, 2024, 3:59pm

Hi,

I am a big fan of Conda and always use it to create virtual environments for my experiments since it can manage different versions of CUDA easily. Normally, I will install PyTorch with the recommended conda way, e.g. conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia. It has been working for years without any problem.

But recently, I have some RL experiments that need to run in a docker container which is based on nvidia/cudagl:11.4.2-devel-ubuntu20.04 that has a system-level cuda 11.4 install in it. Somehow, the conda install solution doesn’t work anymore. The torch.cuda.is_available always return false. However, when I try to install PyTorch with pip, it seems like it also installs some cuda-related package and make torch.cuda.is_available return true.

Now I am confused. If I have multiple cuda installation from different source, e.g. system-level, conda and pip, which cuda is actually in use? I see there are some post say one can use torch.version.cuda to check the cuda version, but I feel it always returns the version that the code is compiled with, even when torch.cuda.is_available returns false. How exactly is PyTorch indexing these dependencies?

ptrblck · February 21, 2024, 4:25pm

The PyTorch binaries ship with their own CUDA dependencies, which are selected/specified in the install command. Your locally installed CUDA toolkit won’t be used unless you build PyTorch from source or a custom CUDA extension.

IcarusWizard · February 21, 2024, 4:32pm

Thanks for the reply. So you mean that for the pip installation, some CUDA dependencies is installed inside PyTorch package, so that it never misuses other versions. Is this understanding correct?

But why does conda fail in this case? It should also use the CUDA it shipped with right?

ptrblck · February 21, 2024, 4:52pm

Yes, the pip wheels depend on the CUDA libs hosted on PyPI (e.g. nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, etc.) while the conda binaries use their corresponding packages from the nvidia conda channel.

I don’t know why the conda binaries do not work inside your container, but outside they seem to work.