ONNX: deploying a trained model in a C++ project

I expect that most people are using ONNX to transfer trained models from Pytorch to Caffe2 because they want to deploy their model as part of a C/C++ project. However, there are no examples which show how to do this from beginning to end.

From the Pytorch documentation here, I understand how to convert a Pytorch model to ONNX format using torch.onnx.export, and also how to load that file into Caffe2 using onnx.load + onnx_caffe2.backend… but that’s just the python side of things.

I also understand something about loading pretrained Caffe2 models into a C++ project from .pb files as described in the Caffe2 C++ tutorials here. (Click on the pretrained.cc link.)

What I’m missing are the steps in between. How do I take the output from onnx_caffe2.backend and create the .pb files found in the Caffe2 C++ tutorials? Maybe these steps are obvious to a seasoned Caffe2 user, but this is my first exposure. A step-by-step recipe would probably help a lot of Pytorch users.


Does anyone know how to do this? It seems like there was a lot of fanfare around ONNX, but not much support.

Actually, with ONNX-Caffe2 package, you can easily turn an ONNX model to a Caffe2 model, then dump it into pb files.

Here is an example:

import onnx
from onnx_caffe2.backend import Caffe2Backend

onnx_proto_file = "/onnx.proto"
torch.onnx.export(G, x, onnx_proto_file, verbose=True)
onnx_model = onnx.load(onnx_proto_file)
init_net, predict_net = Caffe2Backend.onnx_graph_to_caffe2_net(onnx_model.graph)
with open("onnx-init.pb", "wb") as f:
with open("onnx-init.pbtxt", "w") as f:
with open(, "onnx-predict.pb", "wb") as f:
with open("onnx-predict.pbtxt", "w") as f:

This is really great! Thank you!

I think this convert-onnx-to-caffe2 tool is what you are looking for: https://github.com/onnx/tutorials/blob/master/tutorials/OnnxCaffe2Import.ipynb

for future reference, one could look at these two advanced tutorials from the documentation:

1 Like

I know this thread is now quite old, but did this work out @abweiss? Were you able to import it from caffe2 c++??

1 Like

I think inference using onnxruntime is the ultimate solution instead…

Jin, could you elaborate? What is the 1. Current best option for deploying python trained models to a high performance c++ runtime (ideally supporting accelerators). 2. What is the option the pytorch team will recommend in the future (6-12 months)? I.e. where are we now? Where will we be in 6-12 months? Will onnx be replaced by libtorch?

Sorry for the late.

  1. I think currently the best way deploy python trained model depends on your target platform. If you wanna using GPU, fastest way is TensorRT, if you got a CPU, using some QNNPACK instead;

  2. Pytorch team should encourage us using exporting to onnx for the deployment.

@houseroad Thank you! But how can I then import pb files into TensorRT? (I am asking because TensorRT has poor support of ONNX)

Hello everyone,
I must confess I am a total newbie to pytorch framework and neural networks in general. I have been doing some reading and practice for about 2 months now. I am working on my thesis and I need to implement some classification at some point. I currently have a pretrained model in caffe and would like to convert it to c++ version of pytorch since I am a bit comfortable with pytorch now.

I would be glad if someone could direct me to a reading resource/library/tutorial that does this: caffe2–>C++ torch. Most of what I have found is loading from caffe to pytorch in python not c++.

secondly, should I succeed in the convertion will the new model in torch still be a trained model or I woud I have to retain.
I’ll be glad if someone could give me a prompt response as I am running out of time. Thanks in advance. @jinfagang @houseroad

Hi Folks,
A few comments and pointers. Caffe2 is unsupported at this point so, if you are going from PyTorch to ONNX, you can use the exporter directly (which is well supported by MSFT). A tutorial you can use is here: https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html

For TensorRT, with the improvements in the ONNX exporter, the experience should be better now. We can ping Nvidia folks if there are any issues: https://github.com/onnx/onnx-tensorrt

Generally speaking, for backend performance libraries (relating to the QNNPACK comment above), the following is guidance depending on the target platform:

Server CPU (x86) - MKL/DNN and FBGEMM(8bit integer quantized)

ARM/Neon CPU - QNNPACK (8b integer quantized, ready today) and XNNPACK (current WIP for integration)

Hope this helps! Happy NY!!


what XNNPack btw? Xlinx board?

One can use simpler approach with deepC compiler and convert exported onnx model to c++.

Check out simple example at deepC compiler sample test

Compile onnx model for your target machine

Checkout mnist.ir

Step 1:

Generate intermediate code

% onnx2cpp mnist.onnx

Step 2:

Optimize and compile

% g++ -O3 mnist.cpp -I ../../../include/ -isystem ../../../packages/eigen-eigen-323c052e1731/ -o mnist.exe

Step 3:

Test run

% ./mnist.exe

1 Like

Here is a brief talk on how to compile PyTorch models to a C++ project on MCUs

1 Like