Hello,
Please how can we use the new decorator related to the caching_autotune introduced in:
with a new defined kernel. Here is an example of the kernel’s signature:
def kernel_fma(
C, # Pointers to matrices
ACT_INPUTS,
A,
B,
bias,
# Matrix dimensions
M,
N,
K,
CACHE_KEY_M,
CACHE_KEY_N,
CACHE_KEY_K,
# The stride variables represent how much to increase the ptr by when moving by 1
# element in a particular dimension. E.g. stride_am is how much to increase a_ptr
# by to get the element one row down (A has M rows)
stride_om,
stride_on,
stride_im,
stride_ik,
stride_wn,
stride_wk,
# Meta-parameters
BLOCK_M: tl.constexpr,
GROUP_M: tl.constexpr,
BLOCK_N: tl.constexpr,
BLOCK_K: tl.constexpr,
# split k not used, not performant with activation, kept because early_config_prune is expecting it
SPLIT_K: tl.constexpr,
EVEN_K: tl.constexpr,
BIAS: tl.constexpr,
SAVE_ACT_INPUTS: tl.constexpr,
ACTIVATION: tl.constexpr,
)
I introduced this decorator
def autotune(configs, meta, save_cache_hook=False):
def decorator(fn):
return CachingAutotuner(
# force autotune by setting save_cache_hook to False
fn,
meta=meta,
configs=configs,
save_cache_hook=save_cache_hook,
)
return decorator
based on this example of test: pytorch/test_torchinductor.py at fae821c2f166fccab6a3c34e293c7268f61e82ba · pytorch/pytorch · GitHub
But i thought it might be a better way to use the caching_autotune.
Thanks in advance,