Deploying onnx model with TorchServe

thisisjim2 · September 29, 2020, 12:54pm

Hi, I am currently looking at ways to deploy ONNX model simply because inference speed is a lot faster. I took a look at TorchServe, which has many features that I would like in production (logging, batch inference, version control etc.). Does anyone know if it’s possible to deploy ONNX model with TorchServe?

marksaroufim · February 3, 2022, 10:01pm

There’s an easy way to do this by just loading the model from a torchserve handler which are quite general in what you can use. There’s probably a better way to do this but at a high level solution would look something like

    def load_model(self, model_path):
        options = ort.SessionOptions()
        return ort.InferenceSession(model_path, options)

Mohamed_Nabil1 · April 18, 2023, 1:35pm

Hi Mark,
That seems pretty plausible to me
However I’m trying to use torch-model-archiver, would that simply work?

marksaroufim · April 18, 2023, 4:18pm

Oh yeah you can just pass in an onnx file to the archiver like this

torch-model-archiver -f --model-name onnx --version 1.0 --serialized-file linear.onnx --export-path model_store --handler onnx_handler.py

This test might help serve/test_onnx.py at master · pytorch/serve · GitHub - you’ll still need to worry about preprocessing data to the ort runtime