Pytorch not getting compiled with GPU when using conda install

Megh_Bhalerao · March 12, 2023, 11:07pm

Hello,
I am getting the error when I run the following-

>> x = torch.tensor(1).cuda()
>> AssertionError: Torch not compiled with CUDA enabled

I know that this is a known issue but none of the online solutions have worked for me. Following is the details of my setup:

nvidia-smi
Sun Mar 12 22:59:10 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB            On | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0               33W / 300W|      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-16GB            On | 00000000:00:05.0 Off |                    0 |
| N/A   35C    P0               32W / 300W|      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

nvcc -V gives

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

I have installed PyTorch from the PyTorch website using -

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

I have pytorch-cuda installed in my env -
conda list | grep cuda gives -

cuda                      11.7.1                        0    nvidia
cuda-cccl                 11.7.91                       0    nvidia
cuda-command-line-tools   11.7.1                        0    nvidia
cuda-compiler             11.7.1                        0    nvidia
cuda-cudart               11.7.99                       0    nvidia
cuda-cudart-dev           11.7.99                       0    nvidia
cuda-cuobjdump            11.7.91                       0    nvidia
cuda-cupti                11.7.101                      0    nvidia
cuda-cuxxfilt             11.7.91                       0    nvidia
cuda-demo-suite           12.1.55                       0    nvidia
cuda-documentation        12.1.55                       0    nvidia
cuda-driver-dev           11.7.99                       0    nvidia
cuda-gdb                  12.1.55                       0    nvidia
cuda-libraries            11.7.1                        0    nvidia
cuda-libraries-dev        11.7.1                        0    nvidia
cuda-memcheck             11.8.86                       0    nvidia
cuda-nsight               12.1.55                       0    nvidia
cuda-nsight-compute       12.1.0                        0    nvidia
cuda-nvcc                 11.7.99                       0    nvidia
cuda-nvdisasm             12.1.55                       0    nvidia
cuda-nvml-dev             11.7.91                       0    nvidia
cuda-nvprof               12.1.55                       0    nvidia
cuda-nvprune              11.7.91                       0    nvidia
cuda-nvrtc                11.7.99                       0    nvidia
cuda-nvrtc-dev            11.7.99                       0    nvidia
cuda-nvtx                 11.7.91                       0    nvidia
cuda-nvvp                 12.1.55                       0    nvidia
cuda-runtime              11.7.1                        0    nvidia
cuda-sanitizer-api        12.1.55                       0    nvidia
cuda-toolkit              11.7.1                        0    nvidia
cuda-tools                11.7.1                        0    nvidia
cuda-visual-tools         11.7.1                        0    nvidia
cudatoolkit               11.3.1               ha36c431_9    nvidia
pytorch-cuda              11.7                 h67b0de4_1    pytorch
pytorch-mutex             1.0                        cuda    pytorch

python -m torch.utils.collect_env gives

Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.31

Python version: 3.10.9 (main, Mar  1 2023, 18:23:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.10.0-21-cloud-amd64-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: 12.1.66
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB

Nvidia driver version: 530.30.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.5
[pip3] numpydoc==1.5.0
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0           py310h7f8727e_0  
[conda] mkl_fft                   1.3.1           py310hd6ae3a3_0  
[conda] mkl_random                1.2.2           py310h00e6091_0  
[conda] numpy                     1.23.5          py310hd5efca6_0  
[conda] numpy-base                1.23.5          py310h8e6c178_0  
[conda] numpydoc                  1.5.0           py310h06a4308_0  
[conda] pytorch                   1.12.1          cpu_py310hb1f1ab4_1  
[conda] pytorch-cuda              11.7                 h67b0de4_1    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.12.1              py310_cu113    pytorch
[conda] torchvision               0.13.1              py310_cu113    pytorch

System information is the following -

   Static hostname: debian
Transient hostname: megh-gpu-2-v100s-16-vcpus
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 52d373a32c64a329a0d690441b0f70a2
           Boot ID: 844478b6a51748d98fa0725631603484
    Virtualization: kvm
  Operating System: Debian GNU/Linux 11 (bullseye)
            Kernel: Linux 5.10.0-21-cloud-amd64
      Architecture: x86-64

I am guessing maybe the issue is due to some version mismatch, but I am not sure why - since I have cuda 12 on my system and I know that it is backwards compatible.

I would appreciate any help with regards to this, thanks!

ptrblck · March 12, 2023, 11:34pm

You have installed the CPU-only binary as reported here:

[conda] pytorch                   1.12.1          cpu_py310hb1f1ab4_1

so uninstall it and reinstall the PyTorch binary with the desired CUDA runtime. If you get stuck try to create a new virtual environment and reinstall the binaries there.

Megh_Bhalerao · March 13, 2023, 3:40am

I tried creating a fresh env and again tried to install the latest PyTorch but it seems like PyTorch is only building the cpu only binary. Is there any way to solve this? is this some issue with my cuda installation on my machine?

ptrblck · March 13, 2023, 3:47am

Your local CUDA toolkit won’t be used unless you build PyTorch from source or a custom CUDA extension, since the binaries ship with their own CUDA runtime, cuDNN, cuBLAS, NCCL, etc. dependencies. You would only need a properly installed NVIDIA driver to execute PyTorch.
The posted command works for me in a new environment (using Python 3.8) and installs the right packages:

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
...
The following NEW packages will be INSTALLED:
...
  pytorch            pytorch/linux-64::pytorch-1.13.1-py3.8_cuda11.7_cudnn8.5.0_0
  pytorch-mutex      pytorch/noarch::pytorch-mutex-1.0-cuda
  requests           conda-forge/noarch::requests-2.28.2-pyhd8ed1ab_0
  svt-av1            conda-forge/linux-64::svt-av1-1.4.1-hcb278e6_0
  torchaudio         pytorch/linux-64::torchaudio-0.13.1-py38_cu117
  torchvision        pytorch/linux-64::torchvision-0.14.1-py38_cu117

As shown the CUDA 11.7 binaries are properly selected.
Also note that the command installs an older 1.12.1 release instead of the current stable 1.13.1 release and 3rd party libs (torchaudio and torchvision) with CUDA 11.3.
I don’t know why conda fails to install the desired version, so could you post the install output showing all dependencies?

Megh_Bhalerao · March 13, 2023, 6:26am

Thanks, I am using the standard command on the PyTorch.org website with the following configurations -

When I run the above command I get -

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 22.9.0
  latest version: 23.1.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/megh.bhalerao/pkgs/anaconda3/envs/smrsum

  added / updated specs:
    - pytorch
    - pytorch-cuda=11.7
    - torchaudio
    - torchvision


The following NEW packages will be INSTALLED:

  cuda               nvidia/linux-64::cuda-11.7.1-0 None
  cuda-cccl          nvidia/linux-64::cuda-cccl-11.7.91-0 None
  cuda-command-line~ nvidia/linux-64::cuda-command-line-tools-11.7.1-0 None
  cuda-compiler      nvidia/linux-64::cuda-compiler-11.7.1-0 None
  cuda-cudart        nvidia/linux-64::cuda-cudart-11.7.99-0 None
  cuda-cudart-dev    nvidia/linux-64::cuda-cudart-dev-11.7.99-0 None
  cuda-cuobjdump     nvidia/linux-64::cuda-cuobjdump-11.7.91-0 None
  cuda-cupti         nvidia/linux-64::cuda-cupti-11.7.101-0 None
  cuda-cuxxfilt      nvidia/linux-64::cuda-cuxxfilt-11.7.91-0 None
  cuda-demo-suite    nvidia/linux-64::cuda-demo-suite-12.1.55-0 None
  cuda-documentation nvidia/linux-64::cuda-documentation-12.1.55-0 None
  cuda-driver-dev    nvidia/linux-64::cuda-driver-dev-11.7.99-0 None
  cuda-gdb           nvidia/linux-64::cuda-gdb-12.1.55-0 None
  cuda-libraries     nvidia/linux-64::cuda-libraries-11.7.1-0 None
  cuda-libraries-dev nvidia/linux-64::cuda-libraries-dev-11.7.1-0 None
  cuda-memcheck      nvidia/linux-64::cuda-memcheck-11.8.86-0 None
  cuda-nsight        nvidia/linux-64::cuda-nsight-12.1.55-0 None
  cuda-nsight-compu~ nvidia/linux-64::cuda-nsight-compute-12.1.0-0 None
  cuda-nvcc          nvidia/linux-64::cuda-nvcc-11.7.99-0 None
  cuda-nvdisasm      nvidia/linux-64::cuda-nvdisasm-12.1.55-0 None
  cuda-nvml-dev      nvidia/linux-64::cuda-nvml-dev-11.7.91-0 None
  cuda-nvprof        nvidia/linux-64::cuda-nvprof-12.1.55-0 None
  cuda-nvprune       nvidia/linux-64::cuda-nvprune-11.7.91-0 None
  cuda-nvrtc         nvidia/linux-64::cuda-nvrtc-11.7.99-0 None
  cuda-nvrtc-dev     nvidia/linux-64::cuda-nvrtc-dev-11.7.99-0 None
  cuda-nvtx          nvidia/linux-64::cuda-nvtx-11.7.91-0 None
  cuda-nvvp          nvidia/linux-64::cuda-nvvp-12.1.55-0 None
  cuda-runtime       nvidia/linux-64::cuda-runtime-11.7.1-0 None
  cuda-sanitizer-api nvidia/linux-64::cuda-sanitizer-api-12.1.55-0 None
  cuda-toolkit       nvidia/linux-64::cuda-toolkit-11.7.1-0 None
  cuda-tools         nvidia/linux-64::cuda-tools-11.7.1-0 None
  cuda-visual-tools  nvidia/linux-64::cuda-visual-tools-11.7.1-0 None
  cudatoolkit        nvidia/linux-64::cudatoolkit-11.3.1-ha36c431_9 None
  ffmpeg             pytorch/linux-64::ffmpeg-4.3-hf484d3e_0 None
  gds-tools          nvidia/linux-64::gds-tools-1.6.0.25-0 None
  gnutls             pkgs/main/linux-64::gnutls-3.6.15-he1e5248_0 None
  lame               pkgs/main/linux-64::lame-3.100-h7b6447c_0 None
  libcublas          nvidia/linux-64::libcublas-11.10.3.66-0 None
  libcublas-dev      nvidia/linux-64::libcublas-dev-11.10.3.66-0 None
  libcufft           nvidia/linux-64::libcufft-10.7.2.124-h4fbf590_0 None
  libcufft-dev       nvidia/linux-64::libcufft-dev-10.7.2.124-h98a8f43_0 None
  libcufile          nvidia/linux-64::libcufile-1.6.0.25-0 None
  libcufile-dev      nvidia/linux-64::libcufile-dev-1.6.0.25-0 None
  libcurand          nvidia/linux-64::libcurand-10.3.2.56-0 None
  libcurand-dev      nvidia/linux-64::libcurand-dev-10.3.2.56-0 None
  libcusolver        nvidia/linux-64::libcusolver-11.4.0.1-0 None
  libcusolver-dev    nvidia/linux-64::libcusolver-dev-11.4.0.1-0 None
  libcusparse        nvidia/linux-64::libcusparse-11.7.4.91-0 None
  libcusparse-dev    nvidia/linux-64::libcusparse-dev-11.7.4.91-0 None
  libiconv           pkgs/main/linux-64::libiconv-1.16-h7f8727e_2 None
  libidn2            pkgs/main/linux-64::libidn2-2.3.2-h7f8727e_0 None
  libnpp             nvidia/linux-64::libnpp-11.7.4.75-0 None
  libnpp-dev         nvidia/linux-64::libnpp-dev-11.7.4.75-0 None
  libnvjpeg          nvidia/linux-64::libnvjpeg-11.8.0.2-0 None
  libnvjpeg-dev      nvidia/linux-64::libnvjpeg-dev-11.8.0.2-0 None
  libtasn1           pkgs/main/linux-64::libtasn1-4.16.0-h27cfd23_0 None
  libunistring       pkgs/main/linux-64::libunistring-0.9.10-h27cfd23_0 None
  nettle             pkgs/main/linux-64::nettle-3.7.3-hbbd107a_1 None
  nsight-compute     nvidia/linux-64::nsight-compute-2023.1.0.15-0 None
  openh264           pkgs/main/linux-64::openh264-2.1.1-h4ff587b_0 None
  pytorch-cuda       pytorch/noarch::pytorch-cuda-11.7-h67b0de4_1 None
  pytorch-mutex      pytorch/noarch::pytorch-mutex-1.0-cuda None
  torchaudio         pytorch/linux-64::torchaudio-0.12.1-py310_cu113 None
  torchvision        pytorch/linux-64::torchvision-0.13.1-py310_cu113 None


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: \ By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html

done
Retrieving notices: ...working... done

To me it seems as though nothing is getting downloaded. Please let me know if I am missing anything.
Thanks!

Megh_Bhalerao · March 13, 2023, 7:00am

Update -
When I install pytorch via - pip3 install torch torchvision torchaudio inside my env which I created using conda - now I am able to do stuff on GPU i.e. torch.cuda.is_available() returns True - not sure why I was facing an issue with conda.

If the following information helps - I installed cuda following the documentation here -cuda-installation-guide-linux 12.1 documentation

It was a bare bones machine which did not have cuda installed - was it some issue with the way I installed cuda because of which conda was not Able to install it. I followed the instructions exactly on the above webpage.

The below is my new collect-env

Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.31

Python version: 3.10.9 (main, Mar  8 2023, 10:47:38) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.10.0-21-cloud-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB

Nvidia driver version: 530.30.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.2
[pip3] torch==1.13.1
[pip3] torchaudio==0.13.1
[pip3] torchvision==0.14.1
[conda] numpy                     1.24.2                   pypi_0    pypi
[conda] pytorch-cuda              11.7                 h67b0de4_1    pytorch
[conda] torch                     1.13.1                   pypi_0    pypi
[conda] torchaudio                0.13.1                   pypi_0    pypi
[conda] torchvision               0.14.1                   pypi_0    pypi

ptrblck · March 13, 2023, 7:26am

No, since your local CUDA toolkit won’t be used unless you build from source or a custom CUDA extension as described in my previous post.

The conda install command still shows the wrong torchvision and torchaudio versions and does not even show a pytorch package. It also seems you are trying to run this command in the already broken environment so did you try to create a new and empty virtual environment as suggested previously?

Megh_Bhalerao · March 13, 2023, 6:19pm

Oh, yes sorry, I do understand that - the machine did not even have Nvidia drivers installed so I had to do that too which I followed the instructions on the link in my previous post.

Yes - when I create a new and fresh env using the command -
conda create -n test anaconda
Then run the PyTorch installation command via conda in the previous posts I get the exactly same log message as in this post -

ptrblck · March 13, 2023, 7:45pm

Your log message doesn’t even show PyTorch as an installed library in “The following NEW packages will be INSTALLED” so it seems it might be coming from somewhere else (maybe the base environment)?
Compare your install log to mine which shows that the latest stable release with CUDA 11.7 will be installed here.

Megh_Bhalerao · March 14, 2023, 10:50pm

Hi - I think I found out what the issue was - so I was creating an environment using
conda create -n my_env anaconda when I am in my base env
When I was installing anaconda as in the above command it was installing the cpu version of pytorch for some reason.
When I create a fresh env with no anaconda I.e like this conda create -n my_env - and then run the conda install pytorch .... etc as taken from pytorch.org it seems to install cuda pytorch correctly.

Not sure why anaconda was installing pytorch cpu - I saw this from the Conda create logs when I was creating the env

Thanks!