Convert to onnx a model trained with libtorch 1.1.0 (c++)

Hi,

TL;DR: how to convert a C++ libtorch model file .pt to onnx ?

I have this model that I trained in C++ with libtorch, model.pt, saved with torch::save().
I use it in a C++ application using torch::load(), it works just fine.

I would like to convert it to ONNX.

I cannot find documentation on how to do it, nor do I find documentation about the format of the .pt model files.

I’ve tried to use this piece of code to import my model in Python and use torch.onnx.export() to produce the ONNX model.
https://www.learnopencv.com/pytorch-model-inference-using-onnx-and-caffe2/

First the torch.load() call in the Python script resulted in an error, suggesting me to use torch.jit instead:
model.pt is a zip archive (did you mean to use torch.jit.load()?)

I did that, and it passed the load() step.

But then torch.onnx.export() step gives me:
RuntimeError: ‘forward’ method must be a script method

I’m confused.
I’ve read the documentation about TorchScript, and how to use tracing and ScriptModule in Python to then read them in C++.

But from the error message, torch::save() does not seem to generate the same format that the Python counterpart, since I cannot load it in Python.
Since using torch.jit.load() seemed to work, I assumed that in C++, the model was already converted to TorchScript or something when saved with torch::save.
(But then what do torch::jit::save/load methods are here for ?)

But then the fact that the conversion to onnx tells me the method is not a script method, I wonder how to fox this.
In the documentation I found that to make a method a script method, one has to use @torch.jit.script_method.
https://pytorch.org/tutorials/beginner/hybrid_frontend/learning_hybrid_frontend_through_example_tutorial.html?highlight=torch%20jit
But that’s for Python, how am I supposed to use it on my C++ method … ?

Is there a documentation of the format of files generated by torch.save (Python), torch::save (C++) if they differ ?
I can see my model is an archive I can unzip, with the following layout:

├── attributes.pkl
├── model.json
├── **tensors**
│   ├── 0
...
│   └── 99
└── version

And I get that torch.save()/load() uses Pickle in Python.

I think I saw an initiative to use Pickle for C++ as well, is this to be able to torch::load in C++ models saved in Python with torch.save ? Would it then work the other way around ?
(merged in v1.3.0, see Pull Request 23241, commit 75c1419b46624e2bcd01709d93def0bceaaf05a2)

In short, I have a model that I suppose I have managed to import in Python, but cannot convert to onnx with torch.onnx.export().

Thanks for your help.

Libtorch version: 1.1.0

Edit: I saw in “Serialization semantics” page that saving the model and not just the parameters can lead to this:
“However in this case, the serialized data is bound to the specific classes and the exact directory structure used, so it can break in various ways when used in other projects, or after some serious refactors.”

Is this also the case with libtorch ?

Ok about @torch.jit.script_method, it seems that documentation wasn’t up to date here and there, so that’s why I was confused with torchscript, sometimes modules needed to inherit from torch.jit.ScriptModule, sometimes not, I thought this was y a typo…
So here, I’ve learned that starting from 1.2, no need to inherit from ScriptModule but use torch.jit.script( model ) instead.
https://pytorch.org/docs/stable/jit.html#migrating-to-pytorch-1-2-recursive-scripting-api

So maybe I should, in C++:
-update my Pytorch version (I’m using 1.1.0)
-load my trained model .pt
-use torch::jit::script() on it
-save it
-retry the Python conversion script to onnx

I hope this does not require I restart my training, and that I can use my .pt as is.

What do you think ? (I’m still open for answers to my other question to better understand how model I/O work.

Thanks.

Edit: Ok I guess it’s not the way it works, I cannot find torch::jit::script() in C++.