How to use the new autotune intoruced in https://github.com/pytorch/torchdynamo/pull/1338

Hello,
Please how can we use the new decorator related to the caching_autotune introduced in:

with a new defined kernel. Here is an example of the kernel’s signature:

def kernel_fma(
    C,  # Pointers to matrices
    ACT_INPUTS,
    A,
    B,
    bias,
    # Matrix dimensions
    M,
    N,
    K,
    CACHE_KEY_M,
    CACHE_KEY_N,
    CACHE_KEY_K,
    # The stride variables represent how much to increase the ptr by when moving by 1
    # element in a particular dimension. E.g. stride_am is how much to increase a_ptr
    # by to get the element one row down (A has M rows)
    stride_om,
    stride_on,
    stride_im,
    stride_ik,
    stride_wn,
    stride_wk,
    # Meta-parameters
    BLOCK_M: tl.constexpr,
    GROUP_M: tl.constexpr,
    BLOCK_N: tl.constexpr,
    BLOCK_K: tl.constexpr,
    # split k not used, not performant with activation, kept because early_config_prune is expecting it
    SPLIT_K: tl.constexpr,
    EVEN_K: tl.constexpr,
    BIAS: tl.constexpr,
    SAVE_ACT_INPUTS: tl.constexpr,
    ACTIVATION: tl.constexpr,
)

I introduced this decorator :slight_smile:

def autotune(configs, meta, save_cache_hook=False):
    def decorator(fn):
        return CachingAutotuner(
            # force autotune by setting save_cache_hook to False
            fn,
            meta=meta,
            configs=configs,
            save_cache_hook=save_cache_hook,
        )

    return decorator

based on this example of test: pytorch/test_torchinductor.py at fae821c2f166fccab6a3c34e293c7268f61e82ba · pytorch/pytorch · GitHub

But i thought it might be a better way to use the caching_autotune.

Thanks in advance,