How to directly call CuBLAS using PyTorch?

I’m working on an experiment and would like to measure the speedups I can get for using Cublas (specifically the 2:4 sparsity) over the usual PyTorch functions.

I’ve got all of the setup of what I need except for actually calling the Cublas library. Essentially, I have a forward function where I just want to perform a matmul using cublas.

Everything I see online only talks about enabling TF32 but this is not what I want.

There was also this issue Sparse CSR layout GPU backend tracking issue · Issue #60854 · pytorch/pytorch · GitHub which talks about almost what I want but it is difficult to test since it is not in any release yet.

To be clear, I am a-ok with a hacky method to do this, it’s just a timing experiment and not anything production like. If my forward function explicitly can call cublas that is good, I do not care about portability or anything.

Is anyone able to provide guidance in this respect?

Could you post more details on how you deduced your matmul isn’t calling into cuBLAS? Or is your question about circumventing Python in some way?