RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

monee_h.a · February 24, 2021, 2:40am

Hi I have this error of :
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

I am just trying to follow what is in this tutorial:
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

It works fine on my machine:!
Screen Shot 2021-02-23 at 9.38.24 PM|621x500

but there is an error when I do the same on the server:

I would appreciate it if someone can help! Thanks

ptrblck · February 24, 2021, 4:56am

Could you post the setup of your server, i.e. the used GPU, CUDA and PyTorch versions, so that we could try to reproduce it, please?

monee_h.a · February 24, 2021, 4:17pm

GPU: 8x NVIDIA Tesla Volta V100 GPUs with NVlink2
CUDA Version: 11.2
PyTorch Version: pytorch_19.09-py3

Thank you!

monee_h.a · February 24, 2021, 4:25pm

GPU:
NVIDIA-SMI 410.129 Driver Version: 410.129

ptrblck · February 25, 2021, 5:45am

Thanks for the information. I cannot reproduce this issue on a V100 with CUDA11.2 and am not sure what the PyTorch version 19.09 would refer to.
Are you using an NGC container of this old version? If so, it wouldn’t ship with CUDA11.2, so could you post an update about your setup, please?