Hi there. Having the RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
trying to train a deep learning model in PyTorch0.
Here’s the environment configuration running "python -m torch.utils.collect_env’:
Collecting environment information…
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.13.0-40-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA A100-PCIE-40GB
GPU 1: NVIDIA A100-PCIE-40GB
Nvidia driver version: 495.29.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.11.0
[pip3] torchaudio==0.11.0
[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h8f6ccaa_8 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h3811e60_0 conda-forge
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39hde0f152_0 conda-forge
[conda] numpy 1.21.5 py39he7a7128_1
[conda] numpy-base 1.21.5 py39hf524024_1
[conda] pytorch 1.11.0 py3.9_cuda10.2_cudnn7.6.5_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.11.0 py39_cu102 pytorch
[conda] torchvision 0.12.0 py39_cu102 pytorch
I’ve installed prior cudatollkit versions (like 10.2) using anaconda, but the GPU seems to don’t recognize any driver minor then version 11. Any tip to solve the problem ?
Cheers.