I’m working on an experiment and would like to measure the speedups I can get for using Cublas (specifically the 2:4 sparsity) over the usual PyTorch functions.
I’ve got all of the setup of what I need except for actually calling the Cublas library. Essentially, I have a forward function where I just want to perform a matmul using cublas.
Everything I see online only talks about enabling TF32 but this is not what I want.
There was also this issue Sparse CSR layout GPU backend tracking issue · Issue #60854 · pytorch/pytorch · GitHub which talks about almost what I want but it is difficult to test since it is not in any release yet.
To be clear, I am a-ok with a hacky method to do this, it’s just a timing experiment and not anything production like. If my forward function explicitly can call cublas that is good, I do not care about portability or anything.
Is anyone able to provide guidance in this respect?