RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

I am training my models from Google Collab with batch_size = 128 after 1 epoch it has this problem. I don’t know have to fix it with the same batch_size (reduce batch_size to 32 can avoid this problem). Here is Colab spec: driver Version: 460.32.03 CUDA Version: 11.2
You can find my notebook here.
Thanks for your help.

3 Likes

It seems that one of your operands is too large to fit in int32 (or negative, but that seems unlikely).

I thought that recent PyTorch will give a better error (but don’t work around it):

import torch
LARGE = 2**31+1
for i, j, k in [(1, 1, LARGE), (1, LARGE, 1), (LARGE, 1, 1)]:
    inp = torch.randn(i, k, device="cuda", dtype=torch.half)
    weight = torch.randn(j, k, device="cuda", dtype=torch.half)
    try:
        torch.nn.functional.linear(inp, weight)
    except RuntimeError as e:
        print(e)
    del inp
    del weight
at::cuda::blas::gemm<float> argument k must be non-negative and less than 2147483647 but got 2147483649
at::cuda::blas::gemm<float> argument m must be non-negative and less than 2147483647 but got 2147483649
at::cuda::blas::gemm<float> argument n must be non-negative and less than 2147483647 but got 2147483649

But they don’t work around it. (It needs a lot of memory to trigger the bug…)

Maybe you can get a credible backtrace and record the input shapes to the operation that fails.

Best regards

Thomas

So what can I do to solve this problem, I just know to change batch size to smaller.

In order of difficulty:

  • make batch size smaller,
  • make a minimal reproducing example (i.e. just two or three inputs from torch.random and the call to the torch.nn.functional.linear) and file a bug,
  • hot-patch torch.nn.functional.linear with a workaround (splitting the operation into multiple linear or matmul calls),
  • submit a PR with a fix in PyTorch and discuss whether you can add a test or whether it’d take a prohibitive large amount of GPU memory to run (or hire someone to do so).

Best regards

Thomas

3 Likes

Thank for your help.

For the peoples getting this error and ending up on this post, please know that it can also be caused if you have a mismatch between the dimension of your input tensor and the dimensions of your nn.Linear module. (ex. x.shape = (a, b) and nn.Linear(c, c, bias=False) with c not matching)

It is a bit sad that pytorch don’t give a more explicit error messages.

12 Likes

@Jeremy_Cochoy This was really helpful. Solved my issue.

2 Likes

@Jeremy_Cochoy Thanks for your comments!

@Jeremy_Cochoy Thanks!

Hello @Jeremy_Cochoy
I have added an nn.Linear(512,10) layer to my model and the shape of the input that goes into this layer is torch.Size([32,512,1,1]). I have tried reducing the batch size from 128 to 64 and now to 32, but each of these gives me the same error.
Any idea what could be going wrong?

I think you want to transpose the dimensions of your input tensor before and after (Linear — PyTorch 1.9.0 documentation say it expect a Nx*xC_in tensor and you give him a 32x…x1 tensor)

Something like linear(x.transpose(1,3)).transpose(1,3) ?