Runtime Error: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling 'cublasSgemm'

Could you post an executable code snippet to reproduce this issue, please?

I’m really not sure what happened, but I ended up getting access to a separate GPU on which this problem didn’t appear for identical code. I also haven’t been able to reproduce it, sadly. But I will keep you updated if I can reproduce the error!

Are there any update on this? I also encountered the same problem

CUDA Toolkit 11.1 release notes mention an issue fixed in cuBLAS:

Fixed an issue that caused an Address out of bounds error when calling cublasSgemm().

We had cublasSgemm() failing with CUBLAS_STATUS_EXECUTION_FAILED for us when built with 10.0 and running on Ampere GPU (3060 Ti). It ran fine on older GPUs (Pascal, Turing).
We had it run successfully on Ampere when we build it with CUDA 11.2.

P.S. This was with another framework/project.