CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Haris_Cheong · June 4, 2021, 6:19am

Im having the same issue. I realized that the torch.nn.Linear layers are the problem as when I change it to a fully ocnvolutional network it runs without the cublas error.

System Specs:

NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 (installed with runfile so drivers come together)
Installed pytorch using this command: pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-73-generic x86_64)
GeForce GTX 1080 Ti

To reproduce:

import os
os.environ['CUDA_VISIBLE_DEVICES']='0'
import torch

class Classifier(torch.nn.Module):
    def __init__(self):
        super().__init__()
        #classifier
        self.linear_1 = torch.nn.Linear(32*8*4*8,32*8*4)
        self.linear_2 = torch.nn.Linear(32*8*4,32*8)
        self.batch_norm_1 = torch.nn.BatchNorm1d(32*8)

        self.linear_3 = torch.nn.Linear(32*8,32)
        self.linear_4 = torch.nn.Linear(32,1)

        self.activation = torch.nn.ReLU()


    def forward(self,x):
        x = x.reshape(x.shape[0], -1)
        x = self.linear_1(x)
        x = self.activation(x)
        x = self.linear_2(x)
        x = self.activation(x)
        x = self.batch_norm_1(x)

        x = self.linear_3(x)
        x = self.activation(x)
        x = self.linear_4(x)
        return x
    
net = Classifier().cuda()
inp = torch.rand(2,32,8,4,8).cuda()
gt = torch.rand(2,1).cuda()
outp = net(inp)
loss = torch.nn.BCEWithLogitsLoss()(outp, gt)
loss.backward()

Error observed:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-a9db9cb37cd0> in <module>
     35 outp = net(inp)
     36 loss = torch.nn.BCEWithLogitsLoss()(outp, gt)
---> 37 loss.backward()

/disk4/haris/envs/lib/python3.8/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    243                 create_graph=create_graph,
    244                 inputs=inputs)
--> 245         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    246 
    247     def register_hook(self, hook):

/disk4/haris/envs/lib/python3.8/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    143         retain_graph = create_graph
    144 
--> 145     Variable._execution_engine.run_backward(
    146         tensors, grad_tensors_, retain_graph, create_graph, inputs,
    147         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`