Autotune cache key for matrix multiplication

I’m having autotune spending ages on sizes like `AUTOTUNE mm(8192x2304, 2304x256)` (where 2304 is something like seqlen).

Does AUTOTUNE mm round up to multiple of e.g. 32 elements and caches results under the appropriate key? Or is it possible to tell it so?

I’m scared that it would re-optimize for all seqlens :frowning:

Thanks!