Pytorch finds cuda despite nvcc not found?

noctildon · November 23, 2022, 11:17pm

Pytorch sees cuda and runs well on GPU, but nvcc appears to be not found

import torch
torch.cuda.is_available()  # True
torch.cuda.device_count()   #1
torch.cuda.current_device()  # 0
torch.cuda.get_device_name(0) # NVIDIA GeForce RTX 3090

$ nvidia-smi     
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 68%   72C    P2   341W / 420W |   9876MiB / 24576MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    234092      C   python3                          9871MiB |
+-----------------------------------------------------------------------------+

$nvcc --version
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

Once I run $sudo apt install nvidia-cuda-toolkit, nvidia-smi gets removed and pytorch can’t recongize GPU. To install nvidia-smi, I run $sudo apt install nvidia-utils-515-server, but meanwhile nvcc get uninstalled. This looks like a chicken egg problem.

$sudo updatedb; locate nvcc
/etc/nvcc.profile
~/.local/lib/python3.10/site-packages/torch/share/cmake/Caffe2/Modules_CUDA_fix/upstream/FindCUDA/run_nvcc.cmak
...

OS: Ubuntu server 22.04

ptrblck · November 24, 2022, 12:13am

The PyTorch binaries ship with their own CUDA runtime (as well as cuDNN, NCCL etc.) and don’t need a locally installed CUDA toolkit to execute code but only a properly installed NVIDIA driver.
Your local CUDA toolkit (with the compiler) will be used if you build PyTorch from source or a custom CUDA extension.
Based on your described issue, I guess your CUDA toolkit installation uninstalled the NVIDIA driver as well and/or broke it, so try a new full install and make sure CUDA applications work again.

SandeepMenonIntSum · May 24, 2023, 6:28pm

I started off by installed the new pytorch and I can see in the torch.version.cuda that it is using cuda 11.7 version. But nvcc command is not found. Also when I try to install other packages which requires CUDA_HOME to be set I get error saying (OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root).
How should I get around this?
Should I install the nvidia-cuda-toolkit?
When I downloaded the cuda toolkit 11.7 and tried to install I get the message

Existing package manager installation of the driver found. It is strongly  
│ recommended that you remove this before continuing.                     
│ Abort                                                                      
│ Continue

ptrblck · May 24, 2023, 7:13pm

The PyTorch binaries ship with their required CUDA runtime dependencies, not a full CUDA toolkit with a compiler. If you want to build PyTorch from source or a custom CUDA extension you would need to install the full CUDA toolkit locally.

SandeepMenonIntSum · May 24, 2023, 8:18pm

I am not building Pytorch from source. Just some other repositories that require CUDA.
When I try to install the full CUDA toolkit I get this message from the installer

Existing package manager installation of the driver found. It is strongly  
│ recommended that you remove this before continuing.                     
│ Abort                                                                      
│ Continue

Should I uninstall the cuda runtime by pytorch along with pytorch and then install the full toolkit and then pytorch again? Or is there a way to keep things intact and just install the remaining ones of the cudatoolkit.

ptrblck · May 24, 2023, 11:12pm

No, you don’t need to uninstall any PyTorch binaries or their dependencies and the warning is raised because of your already locally installed CUDA toolkit and driver.

SandeepMenonIntSum · May 25, 2023, 1:08am

When I try to install using

sudo sh cuda_11.7.0_515.43.04_linux.run

I downloaded the correct versions from nvidia.
I get

[INFO]: Driver installation detected by command: apt list --installed | grep -e nvidia-driver-[0-9][0-9][0-9] -e nvidia-[0-9][0-9][0-9]
[INFO]: Cleaning up window
[INFO]: Complete
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 515.43.04
[INFO]: Executing NVIDIA-Linux-x86_64-515.43.04.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd  2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 515.43.04 failed, quitting

skyler14 · October 7, 2023, 9:59pm

is there a way to get an nvcc installed and running for the cuda bound to the pytorch you guys ship (in conda here specifically). Some libraries I’m using I think rely on nvcc to check for cuda support so that would help

ptrblck · October 7, 2023, 11:25pm

You could install the matching CUDA toolkit from the NVIDIA website or could try to use the conda package.
Also, checking for nvcc for CUDA support is wrong as these packages check for a build toolchain. Which packages have these checks without using nvcc?

skyler14 · October 8, 2023, 1:35am

There’s a pretty novice developed packaged for performant poisson blending called fpie. Upon further inspection I suspect this might be doing the nvcc check more at the C++ layer since it seems like the python is just an abstraction on top so there might be bigger problems tbh.