Hi, I am currently looking at ways to deploy ONNX model simply because inference speed is a lot faster. I took a look at TorchServe, which has many features that I would like in production (logging, batch inference, version control etc.). Does anyone know if it’s possible to deploy ONNX model with TorchServe?
There’s an easy way to do this by just loading the model from a torchserve handler which are quite general in what you can use. There’s probably a better way to do this but at a high level solution would look something like
def load_model(self, model_path): options = ort.SessionOptions() return ort.InferenceSession(model_path, options)