How to improve inference timing?

Hi Everyone,

Can anyone please recommend to me what is an efficient way to reduce the inference timing? Currently, my model takes 50sec to do the inference.

Objective: I have built the flask app and it takes 50sec to do the inference. So, I would like to reduce the inference timing and return the result from the flask app.

Steps were already taken:

  1. attached model to GPU

Note:
Framework: PyTorch
Application: Object detection

It would be really helpful if anyone recommends an efficient way to optimize.

There’s very little info here that anyone can help you with. What are you system configs? What GPU are you using? Is it for batched inference? or for a single input at a time? What is the flask app?