How to Persist JIT Fused CUDA Kernels State for Efficient Inference on Multi-GPU Setup?

So, I’ve just encountered the same issue as described in Missing Symbols When running AOTInductor example with Libtorch c++11 ABI. I suspect -D_GLIBCXX_USE_CXX11_ABI=1 needs to be passed to the AOT compiler. Is there any way to supply additional compiler options manually?