Hi All,
We are stuck with pytorch installation on server. Below are the collect.py details:
(collect.py reference : https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(gpu_env) python collect.py
Collecting environment information…
/opt/platformx/sentiment_analysis/gpu_env/lib64/python3.8/site-packages/torch/cuda/init.py:82: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at …/c10/cuda/CUDAFunctions.cpp:112.)
return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Red Hat Enterprise Linux 8.6 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28
Python version: 3.8.12 (default, Sep 16 2021, 10:46:05) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] (64-bit runtime)
Python platform: Linux-4.18.0-372.13.1.el8_6.x86_64-x86_64-with-glibc2.2.5
Is CUDA available: False
CUDA runtime version: 11.4.48
GPU models and configuration: GPU 0: GRID M6-4Q
Nvidia driver version: 470.82.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.1
[pip3] torch==1.11.0+cu113
[conda] Could not collect
What all we have tried:
-Installing torch==1.11.0+cu113, torch==1.12.0+cu113, torch==1.11.0+cu102, torch==1.12.0+cu102.
-Installing from .whl files for python 3.8 and cu113
-Upgrading pip and pip3
-Tried a fresh virtual enviroenment.
We know two other ways, but not sure if it would work:
- Downgrading CUDA version from 11.4 to 11.3
- Building pytroch for CUDA 11.4 from source.
We cannot use Anaconda as well, only pip is allowed.
The above methods require sudo permissions that we don’t have. So, it would be better if anyone can suggest alternatives or better solutions.
Thanks
nvidia-smi output :