Good day all,
I know there have been many answers for similar questions, however I havn’t found a solution. My cuda_is_available() is returning False, but weirdly it used to return True with the same configuration. The code is running on a GPU cluster. I was wondering if any people wiser than me can identify anything wrong with the following configuration:
PyTorch version: 1.4.0
Is debug build: False
CUDA used to build PyTorch: 10.0
ROCM used to build PyTorch: N/A
OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 4.9.0
Clang version: Could not collect
CMake version: Could not collect
Python version: 3.6 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: Tesla V100-PCIE-16GB
GPU 1: Tesla V100-PCIE-16GB
GPU 2: Tesla V100-PCIE-16GB
Nvidia driver version: 418.40.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] numpydoc==1.1.0
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.0.130 0
[conda] mkl 2020.1 217
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.1.0 py36h23d657b_0
[conda] mkl_random 1.1.1 py36h0573a6f_0
[conda] numpy 1.18.5 py36ha1c710e_0
[conda] numpy-base 1.18.5 py36hde5b4d6_0
[conda] numpydoc 1.1.0 py_0
[conda] pytorch 1.4.0 py3.6_cuda10.0.130_cudnn7.6.3_0 pytorch
[conda] torchvision 0.5.0 py36_cu100 pytorch
Here is the nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:3B:00.0 Off | Off |
| N/A 31C P0 25W / 250W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... On | 00000000:AF:00.0 Off | Off |
| N/A 32C P0 27W / 250W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-PCIE... On | 00000000:D8:00.0 Off | Off |
| N/A 30C P0 22W / 250W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
I appreciate any assistance. Thank you!