How to save torch.compile so we don't need to re-compile

Is it possible to save the torch.compile state so we don’t need to re-compile each time when running the same exact model? It takes over 15 minutes to compile, and it’s presumably doing the same exact work each time. We are running LLama-3 70B via tensor-parallel with int8 quantization.

Do you have this in bashrc?

export TORCHINDUCTOR_FX_GRAPH_CACHE=1
export TORCHINDUCTOR_CACHE_DIR=/home/ubuntu/.inductor_cache
1 Like

@drisspg I’m observing a similar behavior. torch.compile is taking several minutes at every run (and +1 recompile takes another several minutes), and it’s capturing every time the same model. Is it possible to force-cache its output? Or how can I understand why the cache isn’t hit?

`TORCHINDUCTOR_FX_GRAPH_CACHE=1` cuts the time for the recompile, but not for the first execution of torch.compile-wrapped model