Torch.cuda.is_available() not working pip/conda

Hello,
I am not able to get cuda with pytorch installation to work. Not sure what steps that i am doing are wrong. Below are the steps that i did for conda and pip. Please help me figure out what is going wrong here.

Check nvidia

❯ sudo nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   43C    P8              7W /  170W |      88MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                       

Pure Conda installation test

conda create -n c_install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Channels:
 - pytorch
 - nvidia
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /scratch/vm_admin-c/cache/.conda/envs/c_install

  added / updated specs:
    - pytorch
    - pytorch-cuda=12.4
    - torchaudio
    - torchvision

Installation

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |             main           3 KB
    _openmp_mutex-5.1          |            1_gnu          21 KB
    blas-1.0                   |              mkl           6 KB
    brotli-python-1.0.9        |  py312h6a678d5_8         356 KB
    bzip2-1.0.8                |       h5eee18b_6         262 KB
    ca-certificates-2024.9.24  |       h06a4308_0         130 KB
    certifi-2024.8.30          |  py312h06a4308_0         163 KB
    charset-normalizer-3.3.2   |     pyhd3eb1b0_0          44 KB
    cuda-cudart-12.4.127       |                0         198 KB  nvidia
    cuda-cupti-12.4.127        |                0        16.4 MB  nvidia
    cuda-libraries-12.4.1      |                0           2 KB  nvidia
    cuda-nvrtc-12.4.127        |                0        21.0 MB  nvidia
    cuda-nvtx-12.4.127         |                0          58 KB  nvidia
    cuda-opencl-12.6.77        |                0          25 KB  nvidia
    cuda-runtime-12.4.1        |                0           2 KB  nvidia
    cuda-version-12.6          |                3          16 KB  nvidia
    expat-2.6.3                |       h6a678d5_0         176 KB
    ffmpeg-4.3                 |       hf484d3e_0         9.9 MB  pytorch
    filelock-3.13.1            |  py312h06a4308_0          24 KB
    freetype-2.12.1            |       h4a9f257_0         626 KB
    giflib-5.2.2               |       h5eee18b_0          80 KB
    gmp-6.2.1                  |       h295c915_3         544 KB
    gnutls-3.6.15              |       he1e5248_0         1.0 MB
    idna-3.7                   |  py312h06a4308_0         131 KB
    intel-openmp-2023.1.0      |   hdb19cb5_46306        17.2 MB
    jinja2-3.1.4               |  py312h06a4308_1         355 KB
    jpeg-9e                    |       h5eee18b_3         262 KB
    lame-3.100                 |       h7b6447c_0         323 KB
    lcms2-2.12                 |       h3be6417_0         312 KB
    ld_impl_linux-64-2.40      |       h12ee557_0         710 KB
    lerc-3.0                   |       h295c915_0         196 KB
    libcublas-12.4.5.8         |                0       309.2 MB  nvidia
    libcufft-11.2.1.3          |                0       190.5 MB  nvidia
    libcufile-1.11.1.6         |                0         899 KB  nvidia
    libcurand-10.3.7.77        |                0        39.7 MB  nvidia
    libcusolver-11.6.1.9       |                0       114.0 MB  nvidia
    libcusparse-12.3.1.170     |                0       179.6 MB  nvidia
    libdeflate-1.17            |       h5eee18b_1          64 KB
    libffi-3.4.4               |       h6a678d5_1         141 KB
    libgcc-ng-11.2.0           |       h1234567_1         5.3 MB
    libgomp-11.2.0             |       h1234567_1         474 KB
    libiconv-1.16              |       h5eee18b_3         759 KB
    libidn2-2.3.4              |       h5eee18b_0         146 KB
    libjpeg-turbo-2.0.0        |       h9bf148f_0         950 KB  pytorch
    libnpp-12.2.5.30           |                0       142.8 MB  nvidia
    libnvfatbin-12.6.77        |                0         783 KB  nvidia
    libnvjitlink-12.4.127      |                0        18.2 MB  nvidia
    libnvjpeg-12.3.1.117       |                0         3.0 MB  nvidia
    libpng-1.6.39              |       h5eee18b_0         304 KB
    libstdcxx-ng-11.2.0        |       h1234567_1         4.7 MB
    libtasn1-4.19.0            |       h5eee18b_0          63 KB
    libtiff-4.5.1              |       h6a678d5_0         533 KB
    libunistring-0.9.10        |       h27cfd23_0         536 KB
    libuuid-1.41.5             |       h5eee18b_0          27 KB
    libwebp-1.3.2              |       h11a3e52_0          87 KB
    libwebp-base-1.3.2         |       h5eee18b_1         425 KB
    llvm-openmp-14.0.6         |       h9e868ea_0         4.4 MB
    lz4-c-1.9.4                |       h6a678d5_1         156 KB
    markupsafe-2.1.3           |  py312h5eee18b_0          25 KB
    mkl-2023.1.0               |   h213fc3f_46344       171.5 MB
    mkl-service-2.4.0          |  py312h5eee18b_1          66 KB
    mkl_fft-1.3.11             |  py312h5eee18b_0         205 KB
    mkl_random-1.2.8           |  py312h526ad5a_0         324 KB
    mpmath-1.3.0               |  py312h06a4308_0         988 KB
    ncurses-6.4                |       h6a678d5_0         914 KB
    nettle-3.7.3               |       hbbd107a_1         809 KB
    networkx-3.2.1             |  py312h06a4308_0         2.9 MB
    numpy-2.1.3                |  py312hc5e2394_0          11 KB
    numpy-base-2.1.3           |  py312h0da6c21_0         8.5 MB
    openh264-2.1.1             |       h4ff587b_0         711 KB
    openjpeg-2.5.2             |       he7f1fd0_0         371 KB
    openssl-3.0.15             |       h5eee18b_0         5.2 MB
    pillow-11.0.0              |  py312hfdbf927_0         955 KB
    pip-24.2                   |  py312h06a4308_0         2.8 MB
    pysocks-1.7.1              |  py312h06a4308_0          35 KB
    python-3.12.7              |       h5148396_0        34.6 MB
    pytorch-2.5.1              |py3.12_cuda12.4_cudnn9.1.0_0        1.46 GB  pytorch
    pytorch-cuda-12.4          |       hc786d27_7           7 KB  pytorch
    pytorch-mutex-1.0          |             cuda           3 KB  pytorch
    pyyaml-6.0.2               |  py312h5eee18b_0         217 KB
    readline-8.2               |       h5eee18b_0         357 KB
    requests-2.32.3            |  py312h06a4308_1         123 KB
    setuptools-72.1.0          |  py312h06a4308_0         2.9 MB
    sqlite-3.45.3              |       h5eee18b_0         1.2 MB
    sympy-1.13.2               |  py312h06a4308_0        15.0 MB
    tbb-2021.8.0               |       hdb19cb5_0         1.6 MB
    tk-8.6.14                  |       h39e8969_0         3.4 MB
    torchaudio-2.5.1           |      py312_cu124         6.3 MB  pytorch
    torchtriton-3.1.0          |            py312       233.6 MB  pytorch
    torchvision-0.20.1         |      py312_cu124         8.4 MB  pytorch
    typing_extensions-4.11.0   |  py312h06a4308_0          71 KB
    tzdata-2024b               |       h04d1e81_0         115 KB
    urllib3-2.2.3              |  py312h06a4308_0         228 KB
    wheel-0.44.0               |  py312h06a4308_0         141 KB
    xz-5.4.6                   |       h5eee18b_1         643 KB
    yaml-0.2.5                 |       h7b6447c_0          75 KB
    zlib-1.2.13                |       h5eee18b_1         111 KB
    zstd-1.5.6                 |       hc292b87_0         664 KB
    ------------------------------------------------------------

Testing installation

conda activate c_install

❯ python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
2.5.1
False

Pip Installation

Create conda env with just python

conda create -n  p_install python=3.10

output of python installation

Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /scratch/vm_admin-c/cache/.conda/envs/p_install

  added / updated specs:
    - python=3.10


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pip-24.2                   |  py310h06a4308_0         2.3 MB
    python-3.10.15             |       he870216_1        26.8 MB
    setuptools-75.1.0          |  py310h06a4308_0         1.7 MB
    wheel-0.44.0               |  py310h06a4308_0         109 KB
    ------------------------------------------------------------
                                           Total:        30.9 MB

Check environment

conda activate p_install

which pip3

/scratch/user/cache/.conda/envs/p_install/bin/pip3

Install

pip3 install torch

Collecting torch
  Using cached torch-2.5.1-cp310-cp310-manylinux1_x86_64.whl.metadata (28 kB)
Collecting filelock (from torch)
  Using cached filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.8.0 (from torch)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting networkx (from torch)
  Using cached networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch)
  Using cached jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting fsspec (from torch)
  Using cached fsspec-2024.10.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Using cached nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Using cached nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Using cached nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Using cached nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch)
  Using cached nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch)
  Using cached nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12==2.21.5 (from torch)
  Using cached nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvtx-cu12==12.4.127 (from torch)
  Using cached nvidia_nvtx_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.7 kB)
Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch)
  Using cached nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting triton==3.1.0 (from torch)
  Using cached triton-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Collecting sympy==1.13.1 (from torch)
  Using cached sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy==1.13.1->torch)
  Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch)
  Using cached MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Using cached torch-2.5.1-cp310-cp310-manylinux1_x86_64.whl (906.4 MB)
Using cached nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB)
Using cached nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)
Using cached nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)
Using cached nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)
Using cached nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
Using cached nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB)
Using cached nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB)
Using cached nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)
Using cached nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB)
Using cached nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl (188.7 MB)
Using cached nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
Using cached nvidia_nvtx_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (99 kB)
Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)
Using cached triton-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209.5 MB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Using cached filelock-3.16.1-py3-none-any.whl (16 kB)
Using cached fsspec-2024.10.0-py3-none-any.whl (179 kB)
Using cached jinja2-3.1.4-py3-none-any.whl (133 kB)
Using cached networkx-3.4.2-py3-none-any.whl (1.7 MB)
Using cached MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20 kB)
Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch
Successfully installed MarkupSafe-3.0.2 filelock-3.16.1 fsspec-2024.10.0 jinja2-3.1.4 mpmath-1.3.0 networkx-3.4.2 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.4.127 sympy-1.13.1 torch-2.5.1 triton-3.1.0 typing-extensions-4.12.2

Check cuda

python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
/scratch/user/cache/.conda/envs/p_install/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
  cpu = _conversion_method_template(device=torch.device("cpu"))

2.5.1+cu124
False

Update numpy too

pip3 install numpy
Collecting numpy
  Downloading numpy-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
Downloading numpy-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.3/16.3 MB 5.1 MB/s eta 0:00:00
Installing collected packages: numpy
Successfully installed numpy-2.1.3

Check installation

❯ python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
2.5.1+cu124
False

Build any CUDA Sample in Case you have installed a full CUDA toolkit to verify your driver can communicate with your GPU.

Thanks for the tip. I check if cuda toolkit local installation was ok. The PATH and LD_LIBRARY_PATH seem to be set according to the documentation.

I cloned the cuda samples and ran the devicequery sampe and this is where things get interesting.

I am logged in just via terminal to the machine via ssh and this is the output

Compiling

deviceQuery>conda activate torch
(torch) :deviceQuery>make

/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -o deviceQuery deviceQuery.o
mkdir -p ../../../bin/x86_64/linux/release
cp deviceQuery ../../../bin/x86_64/linux/release

Running

(torch) :deviceQuery>./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
-> no CUDA-capable device is detected
Result = FAIL

Now i logged in to the machine via remote desktop and ran ./devicequery and now


 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          12.4 / 12.4
  CUDA Capability Major/Minor version number:    8.6
 
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.4, CUDA Runtime Version = 12.4, NumDevs = 1
Result = PASS

After this i i ran deviceQuery in the SSH session and it works!

Now after logging out of the desktop session, it’s failing again.

So question is, how do i enable cuda even when not inside a desktop session.

Found the issue, the user needs to be part of the video group in order to be able to use these cuda packages. After adding the user to the group it works file even via ssh.

For others coming here ,

  • Ensure that your user is part of the video group
  • To test, see if you can run nvidia-smi