Guide for implementing CSR/CSC sparse * dense -> dense multiplication

Hi everyone,

As the title says, I have come across a point where I need to specifically use CSR/CSC sparse matrices and perform multiplication with them. It’ll be part of the forward step of a neural network, and so I need it to parallel on both GPU, and perhaps CPU, but GPU is my main focus.

While I have been using PyTorch for a couple months, I have never implemented a custom C++ extension before, and I am wondering if anyone could give me some guidance as to what I should do and how in depth this will probably end up being. I presume ATen does not support CSR/CSC matrices at this point, otherwise PyTorch itself would support it(my guess).

Would I need to separately implement a CSR/CSC class in pytorch, then implement the operation with something like CuSparse, or would it be all in one?

I do not need autograd functionality or anything to do with learning, this will be a post training operation.

In any case, thanks for the help and for making PyTorch a great platform to work with!