About CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

When I use the torch1.7.1 and cuda11.0 to train uperhead-transformer,I always to meet problem.

Traceback (most recent call last):
  File "train.py", line 438, in 
    train(trainlog)
  File "train.py", line 240, in train
    tout, lout = net(img)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "train.py", line 103, in forward
    out = self.head(out0)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/train/models/head/uper_head.py", line 96, in forward
    laterals = [
  File "/train/models/head/uper_head.py", line 97, in 
    lateral_conv(inputs[i])
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/train/models/losses/utils.py", line 144, in forward
    out = self.conv(x)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

and my torch and cuda information:

Python 3.8.2
1.7.1
Linux pytorch-devel-171-prod-v-f5d57bb16238459eb3644237ada6279e-fqxcx 4.18.0-147.5.2.1.h579.eulerosv2r10.x86_64 #1 SMP Tue May 31 11:58:10 CST 2022 x86_64 x86_64 x86_64 GNU/Linux
Linux version 4.18.0-147.5.2.1.h579.eulerosv2r10.x86_64 (root@dd110da1a451) (gcc version 7.3.0 (GCC)) #1 SMP Tue May 31 11:58:10 CST 2022
Collecting environment information...
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 11.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.7.1
[pip3] torchaudio==0.7.2
[pip3] torchelastic==0.2.1
[pip3] torchvision==0.8.2
[conda] blas                      1.0                         mkl    defaults
[conda] cudatoolkit               11.0.221             h6bb024c_0    defaults
[conda] mkl                       2020.2                      256    defaults
[conda] mkl-service               2.3.0            py38he904b0f_0    defaults
[conda] mkl_fft                   1.2.0            py38h23d657b_0    defaults
[conda] mkl_random                1.1.1            py38h0573a6f_0    defaults
[conda] numpy                     1.22.3                   pypi_0    pypi
[conda] pytorch                   1.7.1           py3.8_cuda11.0.221_cudnn8.0.5_0    pytorch
[conda] torchaudio                0.7.2                    pypi_0    pypi
[conda] torchelastic              0.2.1                    pypi_0    pypi
[conda] torchvision               0.8.2                py38_cu110    pytorch

So can you give me some advice about solve the problem?

Could you update PyTorch to the latest stable or nightly release and check if you are still running into this error, please?

en, i understand it, but the remote server manager is not give me the change rights of administrators, i only to provide advice how to change torch or cuda version for solve the problem.

and torch version is fixed in remote server, on the this torch version we have a lot of work, so only can change the cuda version, Based on your experience, which version of cuda is matching the torch version?

and i use myself local server to train the same network, it not the problem.
the torch and cuda information in myself local server:

Collecting environment information…
PyTorch version: 1.7.1+cu110
Is debug build: False
CUDA used to build PyTorch: 11.0
ROCM used to build PyTorch: N/A

OS: Amazon Linux 2 (x86_64)
GCC version: (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 460.106.00
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip] numpy==1.21.6
[pip] torch==1.7.1+cu110
[pip] torchaudio==0.7.2
[pip] torchvision==0.8.2+cu110
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.6 pypi_0 pypi
[conda] numpy-base 1.21.5 py37hf524024_2
[conda] torch 1.7.1+cu110 pypi_0 pypi
[conda] torchaudio 0.7.2 pypi_0 pypi
[conda] torchvision 0.8.2+cu110 pypi_0 pypi


The issue is that you might be running into a known and already fixed error which won’t be backported to PyTorch or cublas in these older versions.

The PyTorch binaries often support multiple CUDA versions and you could select the one you want.
Note that your locally installed CUDA toolkit will not be used unless you build PyTorch from source or build a custom CUDA extension, since the binaries ship with their own CUDA runtime.

In case you cannot directly install the latest PyTorch pip wheel on the server you might be able to use e.g. a docker container and install it there.