I’m experiencing CUDA initialization failures with RTX 5090 GPUs in PyTorch Docker containers, while the same setup works perfectly with RTX 4090.
Hardware & Software:
-
GPU: NVIDIA GeForce RTX 5090 (8x cards)
-
OS: Ubuntu 22.04 (Jammy)
-
Kernel: 6.8.0-79-generic
-
NVIDIA Drivers tested: 570.x.x, 575.64.03, 580.95.05
-
CUDA versions: 12.8, 13.0 (based on driver)
What works:
-
nvidia-smishows all 8 GPUs correctly -
Basic CUDA containers work fine (
nvidia/cuda:12.1.0-base-ubuntu22.04) -
Direct CUDA C API initialization succeeds (
cuInit()works) -
Same setup works perfectly with RTX 4090 cards
What fails:
-
PyTorch CUDA initialization in Docker containers
-
Tested images:
pytorch/pytorch:2.6.0-cuda12.6-cudnn9-devel,pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime,pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime -
Error:
CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu -
PyTorch reports:
torch.cuda.is_available() = Falsebuttorch.cuda.device_count() = 8
Questions:
-
Which PyTorch nightly build should I use for RTX 5090 support?
-
Is there a specific CUDA version requirement?
-
Any recommended Docker images or installation methods?
Any guidance would be greatly appreciated!