Cannot run pytorch with CUDA 12.1 on a server with 8 x A100


I have installed CUDA 12.1 on my machine with 8 x A100 GPUs, and installed the latest pytorch with CUDA 12.1 compatibility.

No matter what I have tried, I could not get rid of the following error, which makes me crazy:

Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux                                                                                                                      
Type "help", "copyright", "credits" or "license" for more information.                                                                                                                 
>>> import torch                                                                                                                                                                       
>>> torch.cuda.is_available()                                                                                                                                                          
/home/ubuntu/.build/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/cuda/ UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at /opt/conda/conda-bld/pytorch_1702400430266/work/c10/cuda/CUDAFunctions.cpp:108.)                                                                                                                             
  return torch._C._cuda_getDeviceCount() > 0

Appreciate your help, please!

Same as here which is not reproducible.