Hii pytorch developer, i am trying to use pytorch on gpu, however it shows ValueError: invalid literal for int() with base 10: “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”
additionally, it shows
/home/seis/ret/prog/anaconda3/envs/cpi/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: Download Drivers | NVIDIA Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /opt/conda/conda-bld/pytorch_1603729066392/work/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
False
Please suggest which pytorch version will support The NVIDIA driver on your system is too old (found version 9010)
Which pytorch version will be compatible with NVIDIA driver version 9010.Can you please suggest, because what the gpu i am using it is older and the engineer doesn’t want to upgrade it to the latest version of NVIDIA driver.
Hello,
I have windows operating system, where i am using WSL for running linux based codes. I have NCCl installed and its workinh fine for distributed pytorch. I have checked it using python script.
Now I have installed torch
when I am importing torch and checking torch.cuda.is_available()
Getting error,
/home/joy/miniconda3/envs/VideoMae/lib/python3.8/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory (Triggered internally at /opt/conda/conda-bld/pytorch_1702400431970/work/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
Sorry for unclear information. 3 days ago i have installed NCCL into the system for working with two GPUS. i have done all the installing and environment settings. i have 2 GPUs and i have installed environment on WSL.
It was working fine, torch distributed running was activated while running the code.
yesterday i have added 2 more same GPUs into the system and restarted the system. while checking NCCL is was responding that four GPUs are active. but when i activate the previous environment and try to check whether torch is accessing the GPUs or not. i got error.