I am attempting to benchmark some things with torch.compile, including the overhead of compilation in different modes. I am calling dynamo.reset()
before each call to torch.compile, however this does not seem to be clearing the cache.
Below is the code I am executing, mul
is a simple function, timeit
simply passes arguments to its argument and calls the function, timing it . Whichever mode is called first has overhead of ~1 sec, the following three have orders of magnitude less.
Please let me know if I am clearing the cache incorrectly, thank you.
dynamo.reset()
def_cmptm, mul_def = timeit( torch.compile, args=mul, kwargs={"mode": "default",} )
dynamo.reset()
max_cmptm, mul_max = timeit( torch.compile, args=mul, kwargs={"mode": "max-autotune"} )
dynamo.reset()
red_cmptm, mul_red = timeit( torch.compile, args=mul, kwargs={"mode": "reduce-overhead"} )
dynamo.reset()
max_nc_cmptm, mul_max_nc = timeit( torch.compile, args=mul, kwargs={"mode": "max-autotune-no-cudagraphs"} )