Is it possible to save the torch.compile state
so we don’t need to re-compile each time when running the same exact model? It takes over 15 minutes to compile, and it’s presumably doing the same exact work each time. We are running LLama-3 70B via tensor-parallel with int8 quantization.
Do you have this in bashrc?
export TORCHINDUCTOR_FX_GRAPH_CACHE=1
export TORCHINDUCTOR_CACHE_DIR=/home/ubuntu/.inductor_cache
1 Like