What is current
torch.compile’s deployment story to non-python host processes?
TorchScript/TorchJit used to have
torch::jit::load and Python-free interpreting machinery
Is there currently such a GIL-free graph/code interpreter?
torchdeploy supposed to help with that?
In general, how are TorchDynamo graphs exported? Can we first compile / generate the kernels and then deploy the kernels?
It seems that vanilla cPython does not support sub-interpreters yet (waiting for PEP 684 implementation in Python 3.12) and even if the interpreter itself supports them, it’s unclear how NumPy / PyTorch would survive them: PEP 684: A Per-Interpreter GIL - #2 by guido - PEPs - Discussions on Python.org (if they use global state / registrations / allocator structures)
So having some simple threadsafe GIL-free Python-like interpreters is useful. What is the current PyTorch story for this usecase?
It’s going to be
torch._inductor.aot_compile() for environments without and with Triton respectively but they’re both not ready yet but you can still poke up around and see what’s up there.
How are these “exported” bits are supposed to be used? Are these supported to run within some Python env? Or some libtorch-based Python-free interpreter for exported bits will exist? What artifact types are
It seems that sub-interpreters are finally getting implemented really soon in stock Python: [feature request] PyTorch support for sub-interpreters with PEP 684 accepted and release in Python 3.12 · Issue #102517 · pytorch/pytorch · GitHub in PEP 684.
I also wonder if torch::deploy / torch::package are related to this hosting scenario?
So for both of these the expectation is they’d run without Python but should be able to pybind them to do run them. I’m actually spending time this week trying to document the usability gaps here so will keep you posted
torch.deploy I can’t comment on it too much since the repo hasn’t been getting too much activity lately GitHub - pytorch/multipy: torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.
There is also a question of multi-threading regarding PyTorch interpretator instances (given that until full support of sub-interpreters in Python/NumPy/PyTorch and without multipy this is very not obvious) - should be clearly stated if it’s supported or not and some best practices if it’s supported (e.g. to share the loaded model in memory between all threads/forked processes).
Sometimes it might be important: e.g. when we have several models and complex pipelining