How to store model in memory when serving/deploy

I have this trained model (jit traced) now, and yes I can serve it using flask etc… but, every time the inference is done, the model is loaded each time - I would like to avoid this model load every time and keep this in memory somehow. What are the suggested approaches for this? I could not find any on pytorch site.

Hello @John_J_Watson,

Perhaps you have already checked these links: DEPLOYING PYTORCH IN PYTHON VIA A REST API WITH FLASKand PyTorch Flask API. Especially, the second one shows that the load_model function can be written somewhere else, not in the script where flask lauches directly.
So the flask won’t reload the model again and again.

Hope it helps somehow.
Jian