Performance Issue: torch.matmul selecting cutlass sm75 kernel for A100

Thanks. Does torch control when cuBLAS is used vs cutlass ? In my profile with torch 2.7 I see that all the kernels are cutlass and not cublas. In a H100 node, I saw a combination of cutlass sm75, cutlass sm80 and a cuBlas sm90 kernel.

  1. Does torch control whether the backend is cuBLAS or cutlass or is cuBLAS controlling that ?
  2. If torch is controlling that, any pointers to how torch chooses between cuBLAS and cutlass ?