Pytorch C++ Front End

Hi,

I have to deploy object detection model in Android. Currently, it is taking lot of time 150-250ms. And this is just the model inference speed. I have used MobileNetV1 as the backbone with 0.25 width multiplier.

Now, I am looking for reducing latency and thinking to use Pytorch C++ Frontend.

Does anyone knows, how fast is Pytorch C++ Frontend or is it same as Pytorch python.

Another option is quantisation. Does it provide a major boost in speed.

I am really having high hopes on Pytorch C++ Frontend but not getting much information from the forum