I encountered an insufficient resources error with the sparse matrix multiplication torch.sparse.mm
RuntimeError: CUDA error: insufficient resources when calling cusparseSpGEMM_compute( handle, opA, opB, &alpha, matA, matB, &beta, matC, computeType, CUSPARSE_SPGEMM_DEFAULT, spgemmDesc, &bufferSize2, dBuffer2)
which is behaving unexpected for me:
When multiplying two certain matrices A1 and B I get the error, while the error does not occur when multiplying A2 and B. This is although A2 was constructed to contain all non-zeros of A1 and a few additional ones.
One example where I observed this error pattern is:
import torch
a1 = torch.sparse_coo_tensor(torch.tensor([[10203, 220497],[2, 2]]), torch.ones([2]), size=[220500,3]).to('cuda:0')
a2 = torch.sparse_coo_tensor(torch.stack([torch.tensor(range(220500)),torch.ones([220500])*2]), torch.ones([220500]), size=[220500,3]).to('cuda:0')
b = torch.sparse_coo_tensor(torch.tensor([[2, 2],[2, 8707]]), torch.ones([2]), size=[3,220500]).to('cuda:0')
# no error in multiplication:
torch.sparse.mm(a2,b)
# insufficient resources error in multiplication:
torch.sparse.mm(a1,b)
As the operation including A2 should use at least the same memory as the operation with A1 (due to A2’s construction), this is very weird to me. Is there some strange behaviour of the memory allocation of this operation?
I am using pytorch 2.0.1 with CUDA 11.8.
Does anyone have an idea why I get this error and/or how I could circumvent it?
Thanks!