Can i run the default cuda 11.3 conda install on cuda 11.6 device?

Hey,

can I run the default cuda 11.3 conda install on a cuda 11.6 device?
Or do I need to downgrade to cuda 11.3 first? I have an RTX 3080.

I tried installing the cuda 11.6 nighly bins first, following this post:

But, with the conda intelpython_full python=3 distribution that I am using, this does not work and raises an error in matplotlib, related to the package freetype (see here):

So, there seem to be two choices:

  • Downgrade the GPU to Cuda 11.3. In that case, I have an existing cuda 11.3 conda environment on another pc that works fine, which I could simply clone. So it would be likely to work. The question is; would it even be necessary to downgrade cuda, or is the newer version 11.6 backward compatible with 11.3?
  • Somehow fix the issue with matplotlib. However, so far I did not manage to solve it. I tried reinstalling involved packages from various channels, but in combination with the nightly bins nothing seems to really solve the issue. I could also try installing into a conda environment which does not utilize the intelpython distribution, however I am reluctant to do this, as the intelpython dist is pretty fast.

EDIT: I believe I found an answer here:

So, when I use conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch this automatically handles everything running on 11.3, despite nvidia-smi showing cuda 11.6 as the current version?

Thanks!
Best, JZ

1 Like

from this post you can install version 11.6 PyTorch + CUDA 11.6
and other packages like torchserve is also available .
pip install torch --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu116
just add to pip install field…

I tried setting up a conda environment with intelpython 3 first, then installed the nighly bins via pip, as recommended by you. However, i end up getting crashes in the new environment that are related to some matplotlib issue with the package freetype. I have not managed to solve it. Could you give me a hint how to set up a full proper environment with 11.6?

Furthermore, I tried running some training without using any matplotlib functions. This worked, but somehow my RTX3080 delivers not much faster performance than my GTX 1650 Ti (which is a notebook on board GPU). To be more precise, the RTX delivers approximately a speedup x2, which seems not so much. So either, my cuda 11.6 pytorch does not fully utilize the RTX power, or x2 is really just the speedup that I can expect? That would suck ^^

Thanks, best, JZ

No, you don’t need to downgrade your local CUDA toolkit as it would only be used if you are building PyTorch from source or custom CUDA extensions.
The binaries ship with their own CUDA runtime and will work.

Create a new virtual environment and reinstall all packages there.

TF32 is disabled by default since ~3 weeks ago so your nightly binary would also not use it.
You can re-enable it in case you are using the default float32 data type.
Also take a look at the performance guide and familiarize yourself with profilers to narrow down bottlenecks as your training could also be blocked by e.g. the data loading while the GPU sits idle.

Yeah, that’s what I did:

conda create -n pyt intelpython3_full python=3 -c intel
pip install torch --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu116
conda install nb_conda_kernels

Afterwards, I get a common error with matplotlib and freetype library, as quoted in my post above. I can’t get rid of this. So, basically, the environment works with pytorch, but I can’t plot anything. -.-

Not sure if that’s the case for my env:

torch.backends.cuda.matmul.allow_tf32, torch.backends.cudnn.allow_tf32
>> (True, True)

by default. Is that what you are referring to?

Thanks, I found a lot of useful stuff in there already. Its probably better to implement these performance tips in the working Cuda 11.3 environment before spending time to build pytorch from source (sth. I have no experience with) with Cuda 11.7 etc.
First and foremost, I will adapt my dataloader to use num_workers = cpu_count() which may probably be the most impactful measure to take for now, and then implement the various things from the guide. Will report back on the performance.