Dear community,
I try to train a simple model in a docker container deployed on my local machine. I use the following docker image for my development container: pytorch/pytorch:2.6.0-cuda12.6-cudnn9-runtime
Whenever I try to use my gpu (nvidia rtx 4090) (by typing"
python -c “import torch; print(torch.cuda.is_available())”), I get the symptom as described below:
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found
What steps did I take:
- nvidia-smi returns correct driver and cuda version (VM):
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07 Driver Version: 572.83 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off |
| 0% 36C P8 10W / 450W | 1248MiB / 24564MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
- nvidia-smi Host:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.83 Driver Version: 572.83 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 WDDM | 00000000:01:00.0 Off | Off |
| 0% 37C P8 11W / 450W | 1226MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
- nvcc --version (VM) returns:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Therefore the gpu seems to be correctly recognized by the container.
Furthermore I tried the following:
- Updating nvidia drivers on host with reboot after installation
- different pytorch docker images (2.3.1; 2.5.1; 2.6) devel and runtime
- Manual removal and reinstall of pytorch in VM
Further Setup Details:
Windows 11 24H2
Nvidia RTX4090
Docker Desktop Engine v26.0.0
I do not know what else to do in order to make pytorch work with my gpu. Any help is appreciated. I’ll gladly give more details if needed.
Thank you in advance!