Resetting cache in benchmark

AidanGoldfarb · February 16, 2024, 5:01pm

I am attempting to benchmark some things with torch.compile, including the overhead of compilation in different modes. I am calling dynamo.reset() before each call to torch.compile, however this does not seem to be clearing the cache.

Below is the code I am executing, mul is a simple function, timeit simply passes arguments to its argument and calls the function, timing it . Whichever mode is called first has overhead of ~1 sec, the following three have orders of magnitude less.

Please let me know if I am clearing the cache incorrectly, thank you.

dynamo.reset()
def_cmptm, mul_def = timeit( torch.compile, args=mul, kwargs={"mode": "default",} )

dynamo.reset()
max_cmptm, mul_max = timeit( torch.compile, args=mul, kwargs={"mode": "max-autotune"} )

dynamo.reset()
red_cmptm, mul_red = timeit( torch.compile, args=mul, kwargs={"mode": "reduce-overhead"} )

dynamo.reset()
max_nc_cmptm, mul_max_nc = timeit( torch.compile, args=mul, kwargs={"mode": "max-autotune-no-cudagraphs"} )

AidanGoldfarb · February 19, 2024, 3:56pm

It appears the first run is simply the compiler warming up. Inserting a dummy compilation solves this. Interestingly, dynamo.reset() still has no effect on compile times. This surprises me, as I would think some information about a previously compiled function would be cached.

bhack · March 9, 2024, 1:20am

But it seems that the team is using dynamo.reset() to clear the cache:

AidanGoldfarb · March 10, 2024, 4:22pm

Perhaps; perhaps it will be documented someday :-). I figured it may have to do with the recompilation cache.

nd21 · April 23, 2025, 3:08am

I am facing a similar issue. My model’s peak memory usage for the first run is higher than the second run. And second run onwards the peak memory is somewhat the same. I am wondering what may be the reason for this?