Pytorch 2 and the c++ interface

ezyang · January 18, 2023, 12:18am

There are a number of different use cases for C++ frontend, which are worth stepping through individually, since PT2 has different implications for them.

Write PyTorch-style code in C++. These users of the C++ API liked PyTorch’s Python API and want to directly code their models the same way they did in Python, but using torch::empty(), C++ NN Modules, etc in C++, for lower overhead or removal of the GIL. There is no way for models written in this way to directly use Dynamo, since Dynamo is entirely predicated on Python bytecode analysis, and we no plans for actually solving this. Additionally, in the limit, PT2 is supposed to remove all of the Python-side overhead that might have originally induced you to port your code to C++, so if you don’t have requirements for Python-less deploy (more on this below), we would hope that the next models you write can be done back in Python.

That being said, it is still possible to make use of PT2 as a tool. You have a few pathways for doing this:

You have identified a region of your graph which can profitably be compiled end-to-end with Inductor. You can capture these operators in Python, and then have Inductor export the fused kernel ahead-of-time, to be invoked from C++. This does not exist today but is on our roadmap for this half.
You could use lazy tensor to capture all of the operations and then hand it to our compiler stack. The compiler stack is still in Python, but at runtime, in principle, Python can be excluded from the hotpath. You would run into some trouble if you needed dynamic shapes, but C++ code can be manually rewritten to symbolically trace integers if necessary.

C++ API as a deployment mechanism. We fully intend to support this via the “export” workflow. In export, we trace an entire model written in Python and produce it some serialization format, which can be loaded by a C++ runtime to be executed. The outputted model may or may not have had optimizations applied to it; this is up in the air. Our current work is on serialization to mobile devices, where Inductor-style compilation doesn’t make sense, but in this half we are also working on server-side export. You should be able to chain these models with other modules to the result.

Hope that helps.