RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

It seems that one of your operands is too large to fit in int32 (or negative, but that seems unlikely).

I thought that recent PyTorch will give a better error (but don’t work around it):

import torch
LARGE = 2**31+1
for i, j, k in [(1, 1, LARGE), (1, LARGE, 1), (LARGE, 1, 1)]:
    inp = torch.randn(i, k, device="cuda", dtype=torch.half)
    weight = torch.randn(j, k, device="cuda", dtype=torch.half)
    try:
        torch.nn.functional.linear(inp, weight)
    except RuntimeError as e:
        print(e)
    del inp
    del weight
at::cuda::blas::gemm<float> argument k must be non-negative and less than 2147483647 but got 2147483649
at::cuda::blas::gemm<float> argument m must be non-negative and less than 2147483647 but got 2147483649
at::cuda::blas::gemm<float> argument n must be non-negative and less than 2147483647 but got 2147483649

But they don’t work around it. (It needs a lot of memory to trigger the bug…)

Maybe you can get a credible backtrace and record the input shapes to the operation that fails.

Best regards

Thomas