I am trying to train a network on my NVIDIA RTX 3070. I receive the following error:
NVIDIA GeForce RTX 3070 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3070 GPU with PyTorch.
I am trying with the latest stable version of PyTorch that works with CUDA 11.1 (also tried with 10.2, but didnât have any luck).
Could it be that NVIDIA 3070 works with CUDA 11.2 and higher, while PyTorch supports up to CUDA 11.1 for the moment?
That seems strange as there should be Ampere support and has been for a while now IIRC. If you can build the latest version of PyTorch you can specify TORCH_CUDA_ARCH_LIST="8.6" in your environment to force it to build with SM 8.6 support.
NIT: TORCH_CUDA_ARCH_LIST will be used while building from source and wonât change the shipped compute capabilities in the binaries, which you can get via print(torch.cuda.get_arch_list()).
Thatâs expected, since Ampere GPUs need CUDA>=11.0
No, the 3070 uses sm_86, which is natively supported in CUDA>=11.1 and is binary compatible to sm_80, so would already work in CUDA=11.0.
No, I donât think so, as it doesnât change the behavior of the binaries and is only used if you build PyTorch from source, which isnât the case if I understand your workflow correctly.
The original error message is raised, if youâve installed a pip wheel or conda binary, which doesnât support your architecture. Based on the message:
it seems you were using a pip wheel with CUDA<=10.2.
How might I verify if the pip wheel used CUDA<=10.2? I wanted to learn how to check for this.
In fact, my problem is more nuanced. I can get the code to work when I run it on the python/ipython interpreters, but any time I try to debug with code or pycharm, I get the error message.
I have been on this for two weeks, trying and re-trying different combinations pytorch versions (and nvidia drivers/ cuda toolkit/libcudnn). I have checked many times the virtualenvironments I use and how I select them in code. I have tried everything I know except building from source, and have not been able to resolve this discrepancy on my system.
Environment Information: my setup tries to follow the Nvidia compatibility matrix: driver-470/toolkit 114/libcudnn8.2/pytorch1.9+cu111
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.11.0-25-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.100
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.20.0
[pip3] torch==1.9.0+cu111 <---- Is this a problem? No cuda 114?
[pip3] torchaudio==0.9.0
[pip3] torchvision==0.10.0+cu111
[conda] Could not collect
And sys.path looks good. Is there anything specific to IDEs that I may be missing on this?
Based on your cross-post I would also assume that you pycharm is using another env with a different PyTorch installation.
I would thus either create a new virtual env and reinstall PyTorch + pycharm there or make sure to uninstall all PyTorch installations in the current and base environment and reinstall it in the current env only.
I discovered my virtual environment had problems. When I tried to install packages to it, they would be installed globally not locally. There was corruption throughout⌠I deleted it and started from scratch.
I came up with the following steps as a guide for anyone who would like to have a type of cheatsheet to verify their installation:
Nvidia Driver and CUDA Toolkit
If already installed, examine your Nvidia GPU driver version
nvidia-smi
or
cat /proc/driver/nvidia/version
Learn its architecture
sudo lshw -C display
Learn your current Linux kernel
uname -a
Look up the Nvidia Compatibility Matrix to determine the correct driver, toolkit, and libcudnn
we will wait for this undtil you setup your virtualenv below.
Testing your systemâs python setup
First note, the location of the system-wide python interpreter
which python3
Note the location of teh system-wide pip
which pip3
What packages are there globally (this command will also list packages that were installed via apt-get install)
python3 -m pip list (or alternatively python3 -m pip freeze)
Create virtualenv if not yet created
python3 -m venv name_for_your_env
Usually, you will be asked to install the required files; normally the file ârequirements.txtâ. Examine it and become familiar with it. From within your virtual environment , install them via:
python3 -m pip install -r requirements.txt
If not already installed, install pytorch.
You can get the pip3/conda command from here. Most people recommend conda/docker installs. We are doing pip3 to have more flexiblity with the packages we need with different repos.
pythonpath is a environment variable that contains paths to load python modules/scripts that are not binaries (i.e. located.
The pythonpath env variable is set in the .bashrc file found in your user folder (the user folder is located at ~/ and the â.â means it is a hidden file). Use your favorite editor to open it:
emacs ~/.bashrc
Look to see if you already set any pythonpathâs.
NVIDIA GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
How would you check the CUDA version within the pip wheel? Currently the I have torch 1.9.1 and torchvision 0.10.1 in the virtual environment. When doing ânvidia-smiâ I get a cuda version of 11.4. Any pointers?
torch.cuda.get_arch_list() will return the compute capabilities used in your current PyTorch build and torch.version.cuda will return the used CUDA runtime.
It depends a bit on your use case. The pip wheels and conda binaries ship with their own CUDA runtime and you will be able to run PyTorch code without using the local CUDA toolkit (as long as the right CUDA runtime is selected; e.g. for Ampere GPUs you have to use CUDA>=11).
However, if you want to build a custom CUDA extension, the local CUDA toolkit will be used and you should install a matching version to the runtime used in PyTorch.
Hi @Ilias_Giannakopoulos , I am facing the same issue with my RTX 3070 and I havenât installed torch globally, I installed torch only inside a conda environment and Iâm using ubuntu as OS. How can I add TORCH_CUDA_ARCH_LIST? Is it possible to add this command while installing with conda?