How to speed up inference time in Windows?

ericliii · January 5, 2021, 3:57am

Hi All,

I’m working on a classification task. The scenario is I will detect faces from a video, crop the faces, and feed them to a classifier to get face attributes like gender and age. Here are the details:

Framework: PyTorch
Model: Resnet 34
Input size: (x, 3, 200, 200), x is the number of detected faces, usually 1 or 2.
System: Windows
Python version: 3.7.6
GPU: GeForce RTX 2070Ti (8G)

After I finished training, I tested the inference time using test dataset, and got <10ms per image (it would be slow for the first image, like about 30ms, because PyTorch model needs some warm up. After that, the inference time will be stable around 10 ms or less). Btw, there is only one face in each image.

But when I integrated the model into my software, the inference time is around 50ms which is much slower than my expectation. This is how I loaded the model in software:

model.load_state_dict(torch.load(model_path))

I’m sure I moved my input and model to GPU and switched model to eval mode. And my software is based on Python.

Does anyone have idea about this issue?

Thanks, guys!

marksaroufim · February 10, 2022, 10:21pm

I’d check a few simple things like

Are you sure the model is running on GPU?
Did you try torch.set_num_threads(1)

And if all of the above fails that’s the benefit of using an inferencing framework like torchserve https://github.com/pytorch/serve