How to use the new autotune intoruced in https://github.com/pytorch/torchdynamo/pull/1338

ayoub_louati · January 4, 2023, 2:46pm

Hello,
Please how can we use the new decorator related to the caching_autotune introduced in:

github.com/pytorch/torchdynamo

Use new Triton runtime

pytorch:main ← jansel:newruntime202209

opened 11:59PM - 24 Sep 22 UTC

jansel

+453 -176

@ptillet recently rewrote the Triton runtime in https://github.com/openai/triton…/pull/644 This should dramatically reduce CPU overheads when cudagraphs is disabled. This updates TorchInductor to use that new runtime. We now call `triton.compile()`, and no longer use `triton.jit()`. There is also some early support for parallel compiles, but still need optimize that part. Note this PR breaks support for Triton versions prior to 998fd5f9afe166247f441999c605dfe624ca9331.

with a new defined kernel. Here is an example of the kernel’s signature:

def kernel_fma(
    C,  # Pointers to matrices
    ACT_INPUTS,
    A,
    B,
    bias,
    # Matrix dimensions
    M,
    N,
    K,
    CACHE_KEY_M,
    CACHE_KEY_N,
    CACHE_KEY_K,
    # The stride variables represent how much to increase the ptr by when moving by 1
    # element in a particular dimension. E.g. stride_am is how much to increase a_ptr
    # by to get the element one row down (A has M rows)
    stride_om,
    stride_on,
    stride_im,
    stride_ik,
    stride_wn,
    stride_wk,
    # Meta-parameters
    BLOCK_M: tl.constexpr,
    GROUP_M: tl.constexpr,
    BLOCK_N: tl.constexpr,
    BLOCK_K: tl.constexpr,
    # split k not used, not performant with activation, kept because early_config_prune is expecting it
    SPLIT_K: tl.constexpr,
    EVEN_K: tl.constexpr,
    BIAS: tl.constexpr,
    SAVE_ACT_INPUTS: tl.constexpr,
    ACTIVATION: tl.constexpr,
)

I introduced this decorator

def autotune(configs, meta, save_cache_hook=False):
    def decorator(fn):
        return CachingAutotuner(
            # force autotune by setting save_cache_hook to False
            fn,
            meta=meta,
            configs=configs,
            save_cache_hook=save_cache_hook,
        )

    return decorator

based on this example of test: pytorch/test_torchinductor.py at fae821c2f166fccab6a3c34e293c7268f61e82ba · pytorch/pytorch · GitHub

But i thought it might be a better way to use the caching_autotune.

Thanks in advance,