I’ve trained a pretty cool model I want to share,and I’ll do that via a server (flask).
Maybe I’m overthinking it, but:
Let’s say that my model takes about 1GB of RAM memory (net=torch.load(…)).
What happens if,say, 50 people are trying to use my model at the same time, and my server has only 8GB of RAM? (so that I can’t actually load the model for each different thread,and let’s say that making my users wait in a queue is really a last resort (I will do that only if nothing else will work at all).
The actual input to the model itself is an image, so is it reasonable to assume that it will be a very rare instance for 2 different users to complete their post requests such that the model will actually be accessed as the same time?
If that assumption is reasonable, how bad is it to share the model between the threads? (say,as a global object of the entire process for example)?
What’s the best approach? Any other tips/further directions I should pursue?