CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

When I use F.conv2d function, The above problem occurred.

(pt1.9.0-cuda11.1) > $ python -m torch.utils.collect_env                                                                                                                                                     [±dev_PHRDV2 ●●]
Collecting environment information...
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.15.0-43-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 470.141.03
cuDNN version: Probably one of the following:
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.1
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.9.0+cu111
[pip3] torchaudio==0.9.0
[pip3] torchfile==0.1.0
[pip3] torchvision==0.10.0+cu111
[conda] numpy                     1.21.2                   pypi_0    pypi
[conda] torch                     1.9.0+cu111              pypi_0    pypi
[conda] torchaudio                0.9.0                    pypi_0    pypi
[conda] torchfile                 0.1.0                    pypi_0    pypi
[conda] torchvision               0.10.0+cu111             pypi_0    pypi

Could you update to the latest stable release with the latest CUDA runtime and check if you are still seeing the issue, please?

Thanks, This method solves the problem.