Deploying onnx model with TorchServe

Hi, I am currently looking at ways to deploy ONNX model simply because inference speed is a lot faster. I took a look at TorchServe, which has many features that I would like in production (logging, batch inference, version control etc.). Does anyone know if it’s possible to deploy ONNX model with TorchServe?

There’s an easy way to do this by just loading the model from a torchserve handler which are quite general in what you can use. There’s probably a better way to do this but at a high level solution would look something like

    def load_model(self, model_path):
        options = ort.SessionOptions()
        return ort.InferenceSession(model_path, options)
2 Likes

Hi Mark,
That seems pretty plausible to me
However I’m trying to use torch-model-archiver, would that simply work?

Oh yeah you can just pass in an onnx file to the archiver like this

torch-model-archiver -f --model-name onnx --version 1.0 --serialized-file linear.onnx --export-path model_store --handler onnx_handler.py

This test might help serve/test_onnx.py at master · pytorch/serve · GitHub - you’ll still need to worry about preprocessing data to the ort runtime

1 Like