Performance Issue: torch.matmul selecting cutlass sm75 kernel for A100

torchlearner · June 11, 2025, 7:55pm

Thanks for your reply.
In this case, I figured out that the sm75 kernel was getting selected when the input tensor was not contiguous. If the tensor was contiguous it was always choosing the sm80 kernel.