AOT compilation for GPU inference. Is it supported?

Is it possible to compile Pytorch model for GPU (for inference) and save compiled model in order to load the compiled model in future to avoid jit recompilation?

Yes! You wanna check out AOT inductor if you’re deploying in c++ https://github.com/pytorch/pytorch/tree/main/test/cpp/aot_inductor

If you’re still deploying with python something like this should work [draft] fsspec code cache by msaroufim · Pull Request #106501 · pytorch/pytorch · GitHub

Thank you! I found aot_compile, compile_fx_aot functions in pytorch python code:

Yeah this is all very hot of the press, will probably get consolidated by next major release

1 Like