CUDA Runtime Error when running Dreambooth/Stablediffusion

Hello, I am running dreambooth on a ubuntu g4dn.2xlarge instance and am running into the following error when training the model. Below is the error and my configurations. Thanks!

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

More info:

root:/home/ubuntu# python3 -m torch.utils.collect_env
Collecting environment information…
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: Could not collect
CMake version: version 3.25.0
Libc version: glibc-2.35

Python version: 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-1028-aws-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 510.108.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.24.1
[pip3] torch==1.13.1
[pip3] torchvision==0.14.1

Update: installed PyTorch version: 1.12.1+cu116 and still see the error:

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasGemmStridedBatchedExFix(...)

I get an same error with:

torch=1.13.0
cuda 11.5

Did you fix it?

I was able to fix it but changed a bunch of things. Below is a partially finished article I typed out for training the model, it lists the requirements that ended up working for me: https://medium.com/@carter2abq/running-dreambooth-with-scalable-aws-backend-b9a075c658b9