RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

tom · June 20, 2021, 5:10pm

In order of difficulty:

make batch size smaller,
make a minimal reproducing example (i.e. just two or three inputs from torch.random and the call to the torch.nn.functional.linear) and file a bug,
hot-patch torch.nn.functional.linear with a workaround (splitting the operation into multiple linear or matmul calls),
submit a PR with a fix in PyTorch and discuss whether you can add a test or whether it’d take a prohibitive large amount of GPU memory to run (or hire someone to do so).

Best regards

Thomas