Hello all,
I am using pytorch ‘1.13.0+cu117’, my env is NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0
in the terminal of python, I tried the very simple example:
>>> import torch
>>> x=torch.ones(2,2,1).to('cuda')
>>> y=torch.ones(2,1,2).to('cuda')
>>> x
tensor([[[1.],
[1.]],
[[1.],
[1.]]], device='cuda:0')
>>> y
tensor([[[1., 1.]],
[[1., 1.]]], device='cuda:0')
>>> y@x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
>>> torch.bmm(y,x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
>>> torch.matmul(y,x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
>>> x=torch.ones(2,1).to('cuda')
>>> y=torch.ones(1,2).to('cuda')
>>> y@x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
>>> torch.mm(y,x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
>>> torch.mm(x,y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
>>>
The issues are obviously not caused by the mismatch size. Anyone has any idea? thanks!