Custom C++ extensions vs libtorch

Hello, I am interested in having the model inference in C++. The model is trained in python, I have a few questions regarding libtorch vs Custom C++ extensions of pytorch.

  1. the main difference between libtorch vs Custom C++ extensions of pytorch (can both be used for inference?)

  2. Which one is better in terms of both ease of use and performance in terms of inference.

  3. can you convert any PyTorch model be used for inference in C++ using libtorch and/or C++ extensions.

  4. Finally, which method is recommended for future use.

P.S: by custom C++ extensions I mean this

  1. libtorch is the C++ frontend while custom C++/CUDA extensions can be used in Python as modules or functional ops. Yes, both can be used for inference.

  2. @tom’s general rule is: “As long as your operands are reasonably large (say 100s of elements, not single elements), Python and Data “administrative overhead” probably isn’t your main problem.”
    This would mean that unless your actual workload is small you shouldn’t see a huge difference between using the Python vs. C++ interface, but tom can correct me here. :slight_smile:

  3. You could either script the model or write it directly in libtorch. As a quick test: try to export the model via torch.jit.script(model) and see if you are running into issues.

  4. libtorch and custom C++ extensions have a different purpose, so I don’t think it’s a choice between these two.


As one would expect, @ptrblck 's advice is the best you’d get. :heart:

Indeed, typical speedups I have seen are 1-10% (more the former for convnets, more the latter for RNN-type models e.g. for the LLTM tutorial example when you just convert to C++ but not implement custom kernels). The other part of that rule of thumb is something I have seen materialize, too: If the Python overhead is relatively costly, likely so is torch::Tensor creation at the C++ level.

Best regards



Thank you @ptrblck @tom. So my main motivation was to include Deep Learning inference into pure C++ software stack. In light of your answers, is it right to conclude that the the scripting the model method would do just fine? One more thing, with this method (scripting + libtorch) can any model be inferred in C++?

There are some models that are harder to get to TorchScript than others but yes, that approach is very general for anything only needing Tensor, number, and/or string input.
Going to C++ fully (without the JIT) is likely only worth it if you don’t have a Python model or have reasons not to compile in the JIT.
There is some ongoing work to make a static runtime for inference: [JIT][Static Runtime] Memory optimization for output tensors · Issue #53867 · pytorch/pytorch · GitHub this could be of interest if it is applicable for your usecase.

1 Like