RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm' on RTX A5000

I find the example with minimal script from pytorch official example here:

https://discuss.pytorch.org/t/trainer-train-stuck-with-rtx-a6000/175093

Maybe we can discuss in that post instead.