gunicorn is used as gunicorn app:app --preload --workers 3
Preload is used to share the resources among the workers.
Set the OMP_NUM_THREADS to 2.
app.py contains the following code
from flask import Flask, jsonify import torch from create_model import testtype or paste code here app = Flask(__name__) model = torch.load('model.pt') @app.route('/predict',methods = ['POST', 'GET']) def prediction(): constant_input = torch.randn(20, 16, 50, 100) prediction = model(constant_input) return jsonify(prediction)
model.pt is created using create_model.py containing
import torch import torch.nn as nn import torch.nn.functional as F class test(nn.Module): def __init__(self): super(test, self).__init__() self.conv1 = nn.Conv2d(16, 33, 3, stride=2) def forward(self, x): x = F.relu(self.conv1(x)) return x m = test() input = torch.randn(20, 16, 50, 100) print(m(input)) torch.save(m, 'model.pt')
But I am not able to infer it .
Instead of using a torch model if I use some numpy operation and just return its output, it is able to.
Though using gunicorn app:app --preload --workers 3 --threads 2 I am able to infer. But anyone please tell me why does it differ only when threads are used and that too in case of pytorch.