[NEED HELP] Trouble with CUDA capability sm_86

ptrblck · October 15, 2021, 11:38pm

No, as described here:

TORCH_CUDA_ARCH_LIST is an env var used for a source build and won’t change anything in the binaries.
You are currently selecting cudatoolkit=10.2 so use 11.1 for your Ampere GPU.

veritas · October 16, 2021, 12:38am

@Ilias_Giannakopoulos @eqy @rojas70 @Edwardius @Nakkhatra

(Sorry if this sounds too self-promotional) I think that my project here can be helpful for everyone in this discussion.

It has detailed explanations about the PyTorch build process and can be used to build wheels for use on local environments, including the RTX 3090.

Please visit and give feedback!

Also, please do not forget to star the repository if you find it solves your problem.

Nakkhatra · October 16, 2021, 2:04pm

Hi, thanks for the suggestion, I have already re-installed pytorch with conda for cuda 11.1 and it worked. But the thing is, sometimes some scripts don’t work without specific versions of torch, say for example the script requires cuda 10.2 and torch 1.8.0. How can I run that on my 3070 then?

ptrblck · October 16, 2021, 7:47pm

You won’t be able to run it on the 3070. However, I don’t see a reason why some scripts would depend on a specific CUDA version, so I would remove this requirement.

SHAHRYAR_GHR · November 7, 2021, 8:07am

Hi
I am happy that someone could mention this problem.
how did you fix it?
would you please tell me exactly in steps?
Thanks

ptrblck · November 8, 2021, 4:21am

The PyTorch binaries support devices with a compute capability of 8.6 since the 1.7 release (any binaries with CUDA11 would work) so could you describe your issue in more detail, please?

SHAHRYAR_GHR · November 8, 2021, 4:47am

NVIDIA GeForce RTX 3070 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.

actually when I run this command : “nvcc --version” I can see that the CUDA toolkit is this version

NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

and when I run this command : " nvidia-smi" , what shows is that I have CUDA version 11.5 installed with 495 driver.
I even installed nightly version for anaconda. but still get the same error.

ptrblck · November 8, 2021, 7:17am

The PyTorch binaries with the CUDA 10.2 runtime ship for these architectures. As already mentioned a few times, you would need to install the PyTorch binaries with the CUDA 11.3 runtime, so select it in the install menu and make sure the install log also shows CUDA 11.3.

SHAHRYAR_GHR · November 8, 2021, 8:00am

Thank you so much for your answer.
Actually I installed pip version of Linux and did not get that error anymore. I think I could install it successfully.
But, after running my code, this shows:

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution.
what could be the problem?

ptrblck · November 8, 2021, 8:02am

You could still be hitting the same error and it wasn’t properly resolved, you could run out of memory, or there is indeed no cudnn algorithm for your workload (haven’t seen it before as so far this error was caused by other issues).

SHAHRYAR_GHR · November 8, 2021, 8:42am

I have 16gb of memory and by monitoring it, I can see that it never passes 12GB threshold, so I am not sure what step of installing is wrong.
Besides, these are the last lines of “.log” file that creates after running my code:

)
)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=1)
(features): Linear(in_features=256, out_features=128, bias=True)
(cls_layer): Linear(in_features=128, out_features=2, bias=True)
)
)
0.9033111181635459 : 0 lbl, 0.09668888183645404 : 1 lbl
0.8181309804896235 : 0 lbl, 0.18186901951037654 : 1 lbl
[0.90331113 0.09668888]

And, this is my path to the library of CUDA:

LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=/usr/local/cuda/lib:$LD_LIBRARY_PATH
if [ ! -d /usr/local/cuda/lib64 ]; then
LD_LIBRARY_PATH=$HOME/usr/local/cuda/lib64:$LD_LIBRARY_PATH

ptrblck · November 8, 2021, 8:51am

Your local CUDA toolkit won’t be used unless you build PyTorch from source or build a custom CUDA extension. Verify that a CUDA 11 runtime is used in your binaries via print(torch.version.cuda) and also make sure that sm_80 and sm_86 are returned by torch.cuda.get_arch_list(). If both these requirements are met, please post a minimal executable code snippet which we could use to run into the cudnn error.

sharifinjf · June 5, 2022, 9:46am

I have had the same problem. I updated my CUDA version, and my issue is solved.

Kevin_Gomez · June 15, 2022, 6:38pm

I have a problem with pythorch and fastai, when I install fastai it changes the version of pytorch and then it does not work anymore because change te version of cuda too, I need cuda 11 or higher and it takes cuda 10.2, do you know how can I do it the right way?

ptrblck · June 15, 2022, 7:50pm

Maybe you could remove the requirement or skip the installation of dependencies if it’s wiping your current installation.

EyeOfOdin · July 21, 2022, 8:56pm

Hi @ptrblck ,
I’m using ubuntu and when i run nvidia-smi it says i’m using cuda version 11.7 but when i check the version of cuda in python using torch.version.cuda it says its version 10.2.

Reviewing your advice you tell people to use the UI install utility which i did and selected the right version of cuda i want and proceeded to install without issues. However after doing it python still says i have the old version of cuda installed. How can i update it to use 11.7 instead of 10.2 because installing from the UI is not changing the version reported in python.

ptrblck · July 21, 2022, 10:37pm

The binaries will not use your local CUDA toolkit as described in this topic already.
If your PyTorch install shows that the CUDA 10.2 runtime is used you have either used the wrong install command, have an older PyTorch installation, etc.
You can check it via pip list | grep torch and conda list | grep torch. Make sure only one PyTorch binary is installed and that it shows the desired CUDA runtime.

If you want to use CUDA 11.7 you could build PyTorch from source. The nightly binaries with 11.7 are building already, but not in the install matrix yet.

annika · August 9, 2022, 7:18am

I don’t know if this is still relevant, but running these 4 lines of code worked for me:

pip uninstall torch -y
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117
pip install cuda-python
pip install torch==1.11.0+cu115 torchvision==0.12.0+cu115 torchaudio==0.11.0+cu115 -f https://download.pytorch.org/whl/torch_stable.html

It was just dependency issue within the different torch packages in conjunction with my cuda version. If these specific versions aren’t correct, you can always check what the error message says and just play around with the different versions until it works for you (the error message will tend to give you a list of different versions to try that are compatible with each other).

mfoglio · September 23, 2022, 3:50pm

This works for me too.

chris-hayes · October 13, 2022, 3:43pm

If it helps anyone else - in my case the python executable pathing was screwed up or something. If I ran whereis python and used the full URL of the binary with “anaconda” in the path, everything would work again.