Torch.matmul calls nvjet_tst kernel in torch 2.7.1

Hi, I’m testing torch 2.7.1 matmul performance with cuda 12.6 on H100 GPU. Surprisingly the matmul is calling nvjet kernel instead of sm90_xmma kernel with torch 2.6.0.
Does anyone know why this behaviour is happening?
Thanks a lot for the kind help!

1 Like

This is expected as cuBLAS selects the fastest kernel for the given workload via its heuristics.

1 Like

Thanks for the clarification! Very helpful :slight_smile:

Is there anyway we can force it to not using nvjet kernels but just use cutlass and cublas kernels?

Not directly, no. If you want to select a specific algorithm you could try to implement your own backend/heuristic via cublasltmatmulalgogetheuristic. However, note that it’s not guaranteed that other algorithms exist for all workloads besides the nvjet engine.