It seems that one of your operands is too large to fit in int32 (or negative, but that seems unlikely).
I thought that recent PyTorch will give a better error (but don’t work around it):
import torch
LARGE = 2**31+1
for i, j, k in [(1, 1, LARGE), (1, LARGE, 1), (LARGE, 1, 1)]:
inp = torch.randn(i, k, device="cuda", dtype=torch.half)
weight = torch.randn(j, k, device="cuda", dtype=torch.half)
try:
torch.nn.functional.linear(inp, weight)
except RuntimeError as e:
print(e)
del inp
del weight
at::cuda::blas::gemm<float> argument k must be non-negative and less than 2147483647 but got 2147483649
at::cuda::blas::gemm<float> argument m must be non-negative and less than 2147483647 but got 2147483649
at::cuda::blas::gemm<float> argument n must be non-negative and less than 2147483647 but got 2147483649
But they don’t work around it. (It needs a lot of memory to trigger the bug…)
Maybe you can get a credible backtrace and record the input shapes to the operation that fails.
Best regards
Thomas