Best way for porting a 100% C++ torch nn::module to jit TorchScript/TensorRT?

Hello,

I am working with libtorch c++, and I want to port a model to tensorRT.

My model is defined in C++.
It has some conditional expressions such as for loops and ifs in some methods, to optionally apply an (externally computed) PCA matrix to the model, and 2 methods to save / restore best weights

I have read that I need to go through TorchScript format, to be able to either import it in torch-tensorRT (c++ code seems available) or convert it to an ONNX model that can be imported by tensorRT

  • For the second option, I looked at options to create a ONNX model on the fly (my model is mainly a n times linear layers) but onnxruntime does not have sufficient API to create a model from scratch and ONNX main lib is not ported to c++.

  • Then It goes back to first option, making a TorchScript from my existing model.
    First, would it be possible to convert it straight away ? Or could it be done after tweaking the conditional expressions and remove the save/restore weights functions (I am aware that if I want to trace the model, I need to do this anyway) ?

  1. I tried to to so by doing :

torch::serialize::OutputArchive differentialModelOut;
model->to(torch::kCPU);
model->save(differentialModelOut);
modelsArchiveOut.write(DifferentialNetModel, differentialModelOut);

But that did not seem to produce a correct file to load in python or in libtorch-tensorRT

  1. Can I manage to store the PCA tensor & integrate a matmul at the end of the forward, and still have a valid TorchScript convertible to ONNX / TensorRT ?

I have also read another option : creating a torch::jit::module that is imported and trained in C++. As this is a massive rewrite, I would preferably stick to converting my existing, pre trained, model to TorchScript.

Thank you for reading this, any comment will be appreciated

As an update, I ended rewriting the network directly in tensorRT, this is not too hard for a model with standard layers (I use fully connected, relu and batch norm - done via a scale layer). It relies only on C++, and it is much simpler.

To achieve this, simply make a tensorRT builder, create a network and add layers by copying weights of existing model.