Torch.matmul calls nvjet_tst kernel in torch 2.7.1

Hi, I’m testing torch 2.7.1 matmul performance with cuda 12.6 on H100 GPU. Surprisingly the matmul is calling nvjet kernel instead of sm90_xmma kernel with torch 2.6.0.
Does anyone know why this behaviour is happening?
Thanks a lot for the kind help!

This is expected as cuBLAS selects the fastest kernel for the given workload via its heuristics.

Thanks for the clarification! Very helpful :slight_smile: