gunicorn is used as gunicorn app:app --preload --workers 3
Preload is used to share the resources among the workers.
Set the OMP_NUM_THREADS to 2.
app.py contains the following code
from flask import Flask, jsonify
import torch
from create_model import testtype or paste code here
app = Flask(__name__)
model = torch.load('model.pt')
@app.route('/predict',methods = ['POST', 'GET'])
def prediction():
constant_input = torch.randn(20, 16, 50, 100)
prediction = model(constant_input)
return jsonify(prediction)
model.pt is created using create_model.py containing
import torch
import torch.nn as nn
import torch.nn.functional as F
class test(nn.Module):
def __init__(self):
super(test, self).__init__()
self.conv1 = nn.Conv2d(16, 33, 3, stride=2)
def forward(self, x):
x = F.relu(self.conv1(x))
return x
m = test()
input = torch.randn(20, 16, 50, 100)
print(m(input))
torch.save(m, 'model.pt')
But I am not able to infer it .
Instead of using a torch model if I use some numpy operation and just return its output, it is able to.
Though using gunicorn app:app --preload --workers 3 --threads 2 I am able to infer. But anyone please tell me why does it differ only when threads are used and that too in case of pytorch.
Thanks.