What is the interplay between the "nvidia driver" and "cuda"?

KFrank · March 13, 2025, 2:37pm

Hello All!

I am under the impression that the “nvidia driver” and “cuda” are two separate things.
How do the two work together?

Would there be a use case where somebody uses the “nvidia driver” but not “cuda”?

I ask because after upgrading a laptop to ubuntu 24.04 (and reinstalling pytorch),
pytorch no longer had cuda support (and nvidia-smi was no longer available).

(For context, I install and run pytorch in a conda environment. nvidia-smi runs both
outside and inside the conda environment.)

I had to first install the nvidia driver (sudo ubuntu-drivers install nvidia:550, outside
of conda) and then install pytorch (with cuda, inside of conda).

nvidia-smi reports (outside and inside of conda):

NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4

Running pytorch (inside of conda) I get:

>>> import torch
>>> torch.__version__
'2.6.0+cu126'
>>> torch.version.cuda
'12.6'

Does the “nvidia driver” automatically come bundled with some version of “cuda”?
When, if ever, does the “CUDA Version: 12.4” get used? (I assume that when I run
pytorch I am using cuda 12.6.)

More context: I am only (knowingly) using cuda when I run pytorch.

Thanks for any information about how these two pieces of the system work together.

K. Frank

ptrblck · March 13, 2025, 3:02pm

Yes, these are two separate things if we are talking about the NVIDIA Driver and the CUDA Toolkit. CUDA itself could mean both: the driver as well as the toolkit.
While the NVIDIA Driver is making sure your system can properly communicate with and use your GPU, the CUDA Toolkit ships with math libraries (cuBLAS, cuSOLVER, cuRAND, etc.) as well as the CUDA compiler toolchain (nvcc, ptxas, etc.). Installing a CUDA Toolkit locally allows you to build CUDA applications, such as PyTorch. You have certainly seen my comment on a few questions explaining that PyTorch binaries ship with their own CUDA runtime dependencies (i.e. the CUDA runtime, CUDA Math libs, etc.) and users only need to properly install an NVIDIA driver.
Now, if you download a CUDA Toolkit from e.g. here it will ship with the NVIDIA driver as well. During the installation you will have an option to install the Toolkit alone or the Driver as well. You can also download the NVIDIA driver separately, which will be of course a smaller package.

Yes, if you mean the CUDA Toolkit by “cuda”. I.e. if you want to mainly run PyTorch applications and don’t plan to ever source build any CUDA application or a PyTorch 3rd party library etc., you won’t need to install a CUDA toolkit and can depend on the CUDA runtime dependencies which will be installed during the PyTorch binary installation. While running pip install torch you will see that nvidia-* wheels are pulled into your environment allowing PyTorch to use these.

This is expected as an OS update would most likely need a driver re-installation as well.

This is the right approach. You can install the NVIDIA drivers from apt or you could also use standalone installers from here. I would strongly advise not to mix these approaches. If you installed the drivers from apt, stick to it for driver updates etc.

Great! As a quick smoke test, you could create a tensor on the GPU making sure PyTorch can also communicate with the device.

Yes, as mentioned before the CUDA Toolkit download ships with the NVIDIA driver as well and you have the option to select the installation of it. However, based on your description you have installed the NVIDIA Driver beforehand and the PyTorch binaries afterwards, which is fine and correct. Note that that PyTorch binaries do not ship with an NVIDIA Driver so you have to install it beforehand.

I assume you are referring to the “Driver Version: 550.120 CUDA Version: 12.4” output of nvidia-smi. If so, then note that nvidia-smi reports the driver version and the corresponding CUDA toolkit this driver ships with. It does not tell you that a full CUDA toolkit was properly installed (and you can have multiple CUDA Toolkits installed on your system). To check for a locally installed CUDA Toolkit run nvcc --version or try to build any CUDA sample from source.
You are thus right in assuming PyTorch will use its own CUDA 12.6.3 runtime dependencies during the execution.

Also, you haven’t asked it but just to explain the compatibility a bit more: the NVIDIA Driver is compatible for all minor CUDA updates. I.e. to run any PyTorch binary with CUDA 12.x you need to install any NVIDIA Driver >= 525.60.13 as described in the Minor Version Compatibility Docs. Once the driver is installed you can simply pip install any PyTorch binary (stable, nightly, etc.) you want.
NVIDIA Drivers are not compatible between CUDA major updates. I.e. if you are using a CUDA 11.x driver, you won’t be able to run applications compiled and linked against CUDA 12.x dependencies.

Let me know if you have any more questions!

KFrank · March 13, 2025, 11:33pm

Hi @ptrblck!

Thank you – this clears things up.

Two more, if I may:

Am I correct that I can have the nvidia driver (say from installing the driver) and the
cuda runtime libraries (say from installing pytorch with cuda support), but not have
the cuda toolkit installed? That is, would I be able to run pytorch with cuda tensors,
but not be able to run, say, nvcc.?

Is it usefully possible to have just the nvidia driver (lets my system talk to the nvidia
card / chip) and not the cuda runtime? For example, maybe an nvidia video card is
used by a windows manager or video game but is not running any cuda computations.
Or does video processing always use “cuda” and the cuda libraries under the hood?

Thanks.

K. Frank

ptrblck · March 14, 2025, 1:50pm

Yes, this is a correct and possible setup. To run PyTorch only you would need to install the NVIDIA Driver and run pip install torch - that’s it. The driver makes sure your system can communicate with the GPU while the PyTorch binary ships with all needed CUDA runtime libraries.

E.g. creating a CUDATensor in PyTorch would internally:

open PyTorch libs which were installed from the wheel, e.g. libtorch.so, libtorch_cuda.so, libtorch_global_deps.so,
open the CUDA Runtime to call cudaMalloc (to create the new CUDATensor): libcudart.so is installed as a wheel and thus located in the same Python env,
open the NVIDIA Driver which would be installed outside of your Python env, e.g. at /lib/x86_64-linux-gnu/libcuda.so.1.

Yes, this is also a valid use case and you would also install an NVIDIA Driver if you want to play video games on Linux. I’m not familiar enough with internal details and which components are used for rendering and general gaming acceleration. My naive understanding is that OpenGL and Vulkan are still used APIs for graphics programming, but I’m purely speculating as I’m not deeply familiar with gaming applications.