Mask RCNN for Production

AI-P-K · August 20, 2021, 1:06pm

I have trained Mask RCNN from pre-trained resnet50_fpn. I want to use the saved weights in a production environment where i store the savedmodel.pt on a server and i make send data to it for inference. All goes well until 3 or more requests are made to the model. If more than 3 requests are made at a same time the inference time goes from 2s/image to 15 seconds. Is this normal? Is there any way to improve this?
PS The behaviour is the same on CPU or GPU