I have this trained model (jit traced) now, and yes I can serve it using flask etc… but, every time the inference is done, the model is loaded each time - I would like to avoid this model load every time and keep this in memory somehow. What are the suggested approaches for this? I could not find any on pytorch site.
Perhaps you have already checked these links: DEPLOYING PYTORCH IN PYTHON VIA A REST API WITH FLASKand PyTorch Flask API. Especially, the second one shows that the load_model function can be written somewhere else, not in the script where flask lauches directly.
So the flask won’t reload the model again and again.
Hope it helps somehow.