RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)

Hi,

I had an error when running a script using torch:


23-03-10 17:24:37.413 - INFO: Model [DDPM] is created.
23-03-10 17:24:37.413 - INFO: Initial Model Finished
Traceback (most recent call last):
File “DDM_train.py”, line 69, in
diffusion.optimize_parameters()
File “/data/work/Diffusion/DDM/model/model.py”, line 53, in optimize_parameters
score, loss = self.netG(self.data, self.loss_lambda)
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/data/work/Diffusion/DDM/model/ddpm_modules/diffusion.py”, line 253, in forward
return self.p_losses(x, loss_lambda, *args, **kwargs)
File “/data/work/Diffusion/DDM/model/ddpm_modules/diffusion.py”, line 238, in p_losses
code = self.denoise_fn(torch.cat([x_in[‘S’], x_in[‘T’], x_t], dim=1), t)
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/data/work/Diffusion/DDM/model/ddpm_modules/unet.py”, line 225, in forward
x = layer(x, t)
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/data/work/Diffusion/DDM/model/ddpm_modules/unet.py”, line 136, in forward
x = self.attn(x)
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/data/work/Diffusion/DDM/model/ddpm_modules/unet.py”, line 117, in forward
context = torch.einsum(‘bhdn,bhen->bhde’, k, v)
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/functional.py”, line 378, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)

It looks like the problem came from calling einsum function?

To test my environment setting, I went to run a test script provided earlier posted and got a similar error:

Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
torch.version
‘1.13.1+cu117’
device=“cuda:1”
print(torch.cuda.get_device_properties(device))
_CudaDeviceProperties(name=‘NVIDIA RTX A6000’, major=8, minor=6, total_memory=48676MB, multi_processor_count=84)
import torch.nn as nn
rr = torch.zeros([2,20,5000]).to(device)
layer1 = nn.Conv1d(20,500,kernel_size=4,stride=4,groups=20,bias=False).to(device)
layer2 = nn.Linear(500,768).to(device)
l1out = layer1(rr)
l2out = layer2(l1out.transpose(1,2))
Traceback (most recent call last):
File “”, line 1, in
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py”, line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)

Here is my collect_env.py outputs:
Collecting environment information…
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000

Nvidia driver version: 525.60.13
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] denoising-diffusion-pytorch==1.2.2
[pip3] ema-pytorch==0.2.1
[pip3] lion-pytorch==0.0.5
[pip3] numpy==1.23.1
[pip3] torch==1.13.1
[pip3] torchvision==0.14.1
[conda] Could not collect

Many thanks for any inputs on the error when calling “cublasSgemmStridedBatched”.

J. L.

Could you post a minimal and executable code snippet to reproduce the issue and wrap it into three backticks ```, please?

Thanks @ptrblck for your prompt response to my question. Here is the snippet I used to generate the problem although the snippet was not doing exactly the same thing I was doing in the original problem.

import torch
torch.version
device=“cuda:1”
print(torch.cuda.get_device_properties(device))
_CudaDeviceProperties(name=‘NVIDIA RTX A6000’, major=8, minor=6, total_memory=48676MB, multi_processor_count=84)
import torch.nn as nn
rr = torch.zeros([2,20,5000]).to(device)
layer1 = nn.Conv1d(20,500,kernel_size=4,stride=4,groups=20,bias=False).to(device)
layer2 = nn.Linear(500,768).to(device)
l1out = layer1(rr)
l2out = layer2(l1out.transpose(1,2))

Traceback (most recent call last):
File “”, line 1, in
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1194, in _call_impl
return forward_call(*input, **kwargs)
File “/home/lomahu/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py”, line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)

I highly suspected that the problem is related to my versions of torch, tensorflow or RTX A6000 GPU cards. Your help is much appreciated!

J.L.

Any solutions so far?
I also encounter the same issue, especially when I use einsum.