CUDA error: CUBLAS_STATUS_EXECUTION_FAILED on cuda 11.8

Hello,
I’m trying to run pytorch using cuda 11.8 on my device using an RTX 3060 card:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06    Driver Version: 520.56.06    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   54C    P8    12W /  N/A |      5MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2515      G   /usr/lib/Xorg                       4MiB |
+-----------------------------------------------------------------------------+

however, any time I run my code I get a

File "/home/merlo/.local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Nothing I could find online helped, what could be the issue? I’m running pytorch:

pytorch                   1.13.0          py3.10_cuda11.7_cudnn8.5.0_0    pytorch

installed directly from the official channel. Can I somehow update to use cuda 11.8?

Thanks!

You are using the CUDA 11.7 runtime since you’ve installed the corresponding PyTorch binary. Your local CUDA toolkit will only be used if you are building PyTorch from source or a custom CUDAExtenstion.

In any case, could you post a minimal, executable code snippet to reproduce the issue, please?