Questions about compiling and exporting a model to TensorRT

pierisk · April 30, 2024, 8:46pm

I am trying understand the differences between the various ways to compile/export a PyTorch model.

Background: My end goal is to export and use my detectron2 trained model as a TensorRT .engine file in NVIDIA frameworks, which got me into reading about TorchScript, torch.fx, torch.export, torch.compile, TorchDynamo with different backends e.g. TorchInductor and torch_tensorrt.

Since the model (mask2former with SWIN transformer backend) and the detectron2 framework in general have complex code constructs and dynamic control flow, I’ve ruled out torch.fx and all tracing methods (please correct me if my thinking is wrong).

I’m now left with these questions:

Since my code has conditionals and dynamic control flow, is torch.jit.script the only “easy” option due to graph breaks and wanting to use it outside Python runtime?
Is torch.compile with the PyTorch model as input suitable for my goal (eventually serializing to a TensorRT engine file and using it outside Python), or should I first convert the model to TorchScript?
After compiling the model with any of the above methods, my understanding is I still need to use torch_tensorrt to serialize the model. Is there another way? I’ve stumbled upon the torch2rt project by NVIDIA but I’m not certain if it’s a better option.

Sorry for the long post and appreciate any help!

lime · May 8, 2024, 7:26pm

Pieri’s mate here,

We made some progress,

The other more “beta” option right now would be to follow any of those links 1 2 3 that compiles to FX IR (with nodes and all) together with Dynamo, dynamic tracing and all the arcane magic spells.
Even though we have successfully compiled the model with the inductor backbone we’re still facing issues on possible paths to serialize optimized module type output by inductor. I really doubt there’s any paths to serialization from this IR.
NVIDIA’s efforts to this have been mainly for JetBot and using torch_tensorrt backend with dynamo seems like a plausible compromise but we’d still face the serialization issues (i.e. get a trt engine out of this).

We’d really appreciate some feedback in that and the general usability of Dynamo right now. I really want to embrace this project but it feels like it’s not there yet and we’re left with good old scripting

I am shamelessly tagging @ezyang on of the major celebs of this project