[NEED HELP] Trouble with CUDA capability sm_86

rojas70 · August 11, 2021, 6:26am

Thank you.

I discovered my virtual environment had problems. When I tried to install packages to it, they would be installed globally not locally. There was corruption throughout… I deleted it and started from scratch.

I came up with the following steps as a guide for anyone who would like to have a type of cheatsheet to verify their installation:

Nvidia Driver and CUDA Toolkit

If already installed, examine your Nvidia GPU driver version

nvidia-smi

or

cat /proc/driver/nvidia/version

Learn its architecture

sudo lshw -C display

Learn your current Linux kernel

uname -a

Look up the Nvidia Compatibility Matrix to determine the correct driver, toolkit, and libcudnn

Support Matrix - NVIDIA Docs

Support Matrix - NVIDIA Docs (gcc, glibc)

Install Driver

sudo apt install nvidia-driver-XXX

Install CUDA Toolkit

https://developer.nvidia.com/cuda-downloads

Install libcudnnX (useful to do deep learning with cuda)

Installation Guide - NVIDIA Docs

sudo apt install libcudnnX

Install pytorch

we will wait for this undtil you setup your virtualenv below.

Testing your system’s python setup

First note, the location of the system-wide python interpreter

which python3

Note the location of teh system-wide pip

which pip3

What packages are there globally (this command will also list packages that were installed via apt-get install)

python3 -m pip list (or alternatively python3 -m pip freeze)

Create virtualenv if not yet created

python3 -m venv name_for_your_env

Usually, you will be asked to install the required files; normally the file “requirements.txt”. Examine it and become familiar with it. From within your virtual environment , install them via:

python3 -m pip install -r requirements.txt

If not already installed, install pytorch.

You can get the pip3/conda command from here. Most people recommend conda/docker installs. We are doing pip3 to have more flexiblity with the packages we need with different repos.

go to https://pytorch.org/ (choose your config). You may get a command like:

Note that if a package is properly installed, it should appear in your virtual_env/lib/pythonX.X/site-packages forlder.

Additionally, ensure your pythonpath is properly set (learn more about pythonpath/imports/sys.path here: The Definitive Guide to Python import Statements | Chris Yeh)

pythonpath is a environment variable that contains paths to load python modules/scripts that are not binaries (i.e. located.

The pythonpath env variable is set in the .bashrc file found in your user folder (the user folder is located at ~/ and the “.” means it is a hidden file). Use your favorite editor to open it:

emacs ~/.bashrc

Look to see if you already set any pythonpath’s.

export PYTHONPATH=$PYTHONPATH:/new/path1/goes/here:/new/path2/goes/here:

Sanity Checks for torch/gpu

In your virtualenv, open a python interpreter:

python3 (or even better ipython3 – you will need to install first pip3 install ipython).

Check the system path from which modules are loaded

import sys

sys.path (should not see undesired paths here).

Import torch

impor torch

Double check that this torch module is located inside your virtual environment

import imp

imp.find_module(‘torch’) → should return a path in your virtualenv

Check the version of your torch module and cuda

torch.version

torch.version.cuda

Check the supported architectures

torch.cuda.get_arch_list()

Check for the number of gpu detected

torch.cuda.device_count()

Can you read the device?

device=torch.device(‘cuda:0’) # 0 by default, if you have more gpu’s increase your index.