Insufficient resources when calling cusparseSpGEMM

Hello, I am getting the error

CUDA error: insufficient resources when calling 
`cusparseSpGEMM_compute( handle, opA, opB, &alpha, 
matA, matB, &beta, matC, computeType, 
CUSPARSE_SPGEMM_DEFAULT, spgemmDesc, &bufferSize2, dBuffer2)

when using different sizes of sparse matrix multiplication on the GPU (not the actual number of values but for the dimension that are taken when creating a new matrix.)

  • How do I debug this error in general?
  • What are the maximal dimensions that is allowed for this operation on CUDA?
  • Why do I get this error only when using a GPU, not the CPU?

Could you post a minimal, executable code snippet to reproduce the issue as well as the output of python -m torch.utils.collect_env, please?