Pytorch Cuda returns Error 500: named symbol not found

torch_fan · April 1, 2025, 6:36pm

Dear community,
I try to train a simple model in a docker container deployed on my local machine. I use the following docker image for my development container: pytorch/pytorch:2.6.0-cuda12.6-cudnn9-runtime

Whenever I try to use my gpu (nvidia rtx 4090) (by typing"
python -c “import torch; print(torch.cuda.is_available())”), I get the symptom as described below:

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found

What steps did I take:

nvidia-smi returns correct driver and cuda version (VM):

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 572.83         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:01:00.0 Off |                  Off |
|  0%   36C    P8             10W /  450W |    1248MiB /  24564MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

nvidia-smi Host:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.83                 Driver Version: 572.83         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090      WDDM  |   00000000:01:00.0 Off |                  Off |
|  0%   37C    P8             11W /  450W |    1226MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

nvcc --version (VM) returns:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

Therefore the gpu seems to be correctly recognized by the container.

Furthermore I tried the following:

Updating nvidia drivers on host with reboot after installation
different pytorch docker images (2.3.1; 2.5.1; 2.6) devel and runtime
Manual removal and reinstall of pytorch in VM

Further Setup Details:
Windows 11 24H2
Nvidia RTX4090
Docker Desktop Engine v26.0.0

I do not know what else to do in order to make pytorch work with my gpu. Any help is appreciated. I’ll gladly give more details if needed.

Thank you in advance!

ptrblck · April 1, 2025, 6:56pm

This error could be raised by a missing or old NVIDIA Container Toolkit. Which version are you using?

dbrenes · April 1, 2025, 7:24pm

Hi @torch_fan.
Your nvidia-smi command results shows CUDA Version: 12.8, however, this just means that the highest CUDA version that your driver supports is 12.8. Meanwhile, nvcc --version shows that you actually have CUDA 12.1 installed on your host device.

Since you are using the pytorch/pytorch:2.6.0-cuda12.6-cudnn9-runtime Docker image, the error you’re seeing could be due to a mismatch in the supported CUDA versions between the container and your host device. You could try out a Docker image that matches your installed CUDA version, such as 2.5.1-cuda12.1-cudnn9-runtime.

Machine Learning Engineer at RidgeRun.ai
Contact us: support@ridgerun.ai
https://www.ridgerun.ai

torch_fan · April 1, 2025, 7:37pm

Isn’t the return of the command “nvcc --version” indicative that the nvidia toolkit was properly installed? - In this case it tells me the following:

Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

torch_fan · April 1, 2025, 7:42pm

I don’t have Cuda installed on my host machine (the excerpt you see above is from the guest). Installation of Cuda was not necessary yet. Windows has updated my graphics driver unwillingly which broke cuda in my VM. Therefore I know that it has once worked with an older version of pytorch (2.3.1) and an older nvidia driver (version unknown).

ptrblck · April 1, 2025, 7:50pm

I don’t mean the CUDA compiler (nvcc) but the NVIDIA Container Toolkit allowing you to use your GPU inside containers.

torch_fan · April 2, 2025, 6:18am

I’ve installed the following version in the guest VM:

nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.17.5
commit: f785e908a7f72149f8912617058644fd84e38cde

No change regarding the error-message

EDIT: According to "CUDA_ERROR_NOT_FOUND: named symbol not found" in Docker container · Issue #68711 · tensorflow/tensorflow · GitHub people encountered a very similar issue with tensorflow + Cuda; Maybe someone is willing to share their working setup configuration w.r.t. pytorch docker image version (incl. Cuda) + driver version + docker version

torch_fan · April 2, 2025, 6:57am

After many attempts, I have found the solution:
I needed to upgrade the Docker Desktop version

For future reference - the following configuration works for me:

Docker Engine v28.0.4 with Docker Desktop 4.40.0 (187762)
Docker image: pytorch/pytorch:2.6.0-cuda12.6-cudnn9-runtime
nvidia driver version: 572.83; Cuda-version: 12.8

There is no need to install any additional cuda framework on the host nor on the client. I didn’t need to install any nvidia-ctk (container-toolkit) as well.

Thank you for your great support nevertheless!

Epiphany_Fall · April 19, 2025, 5:55pm

The same, Error:500. Solved my problem, tks.
I use 4.23.X before.
Now:
Win11: CUDA Version: 12.9 Driver Version: 576.02 NVIDIA-SMI 576.02 nothing else;
Docker Desktop: 4.40
Image: [pytorch/pytorch:2.6.0-cuda12.6-cudnn9-runtime]
Container: All CUDA info inherit windows host, regardless of the “written” version matching which has no relationship, did nothing, dont do anything.

Only update docker engine (docker desktop)

Thankyou all