[NEED HELP] Trouble with CUDA capability sm_86

No, as described here:

TORCH_CUDA_ARCH_LIST is an env var used for a source build and won’t change anything in the binaries.
You are currently selecting cudatoolkit=10.2 so use 11.1 for your Ampere GPU.

@Ilias_Giannakopoulos @eqy @rojas70 @Edwardius @Nakkhatra

(Sorry if this sounds too self-promotional) I think that my project here can be helpful for everyone in this discussion.

It has detailed explanations about the PyTorch build process and can be used to build wheels for use on local environments, including the RTX 3090.

Please visit and give feedback!

Also, please do not forget to star the repository if you find it solves your problem.

Hi, thanks for the suggestion, I have already re-installed pytorch with conda for cuda 11.1 and it worked. But the thing is, sometimes some scripts don’t work without specific versions of torch, say for example the script requires cuda 10.2 and torch 1.8.0. How can I run that on my 3070 then?

You won’t be able to run it on the 3070. However, I don’t see a reason why some scripts would depend on a specific CUDA version, so I would remove this requirement.

2 Likes

Hi
I am happy that someone could mention this problem.
how did you fix it?
would you please tell me exactly in steps?
Thanks

The PyTorch binaries support devices with a compute capability of 8.6 since the 1.7 release (any binaries with CUDA11 would work) so could you describe your issue in more detail, please?

NVIDIA GeForce RTX 3070 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.

actually when I run this command : “nvcc --version” I can see that the CUDA toolkit is this version

NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

and when I run this command : " nvidia-smi" , what shows is that I have CUDA version 11.5 installed with 495 driver.
I even installed nightly version for anaconda. but still get the same error.

The PyTorch binaries with the CUDA 10.2 runtime ship for these architectures. As already mentioned a few times, you would need to install the PyTorch binaries with the CUDA 11.3 runtime, so select it in the install menu and make sure the install log also shows CUDA 11.3.

Thank you so much for your answer.
Actually I installed pip version of Linux and did not get that error anymore. I think I could install it successfully.
But, after running my code, this shows:

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution.
what could be the problem?

You could still be hitting the same error and it wasn’t properly resolved, you could run out of memory, or there is indeed no cudnn algorithm for your workload (haven’t seen it before as so far this error was caused by other issues).

I have 16gb of memory and by monitoring it, I can see that it never passes 12GB threshold, so I am not sure what step of installing is wrong.
Besides, these are the last lines of “.log” file that creates after running my code:

)
)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=1)
(features): Linear(in_features=256, out_features=128, bias=True)
(cls_layer): Linear(in_features=128, out_features=2, bias=True)
)
)
0.9033111181635459 : 0 lbl, 0.09668888183645404 : 1 lbl
0.8181309804896235 : 0 lbl, 0.18186901951037654 : 1 lbl
[0.90331113 0.09668888]

And, this is my path to the library of CUDA:

LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=/usr/local/cuda/lib:$LD_LIBRARY_PATH
if [ ! -d /usr/local/cuda/lib64 ]; then
LD_LIBRARY_PATH=$HOME/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Your local CUDA toolkit won’t be used unless you build PyTorch from source or build a custom CUDA extension. Verify that a CUDA 11 runtime is used in your binaries via print(torch.version.cuda) and also make sure that sm_80 and sm_86 are returned by torch.cuda.get_arch_list(). If both these requirements are met, please post a minimal executable code snippet which we could use to run into the cudnn error.