I’m working on a classification task. The scenario is I will detect faces from a video, crop the faces, and feed them to a classifier to get face attributes like gender and age. Here are the details:
Model: Resnet 34
Input size: (x, 3, 200, 200), x is the number of detected faces, usually 1 or 2.
Python version: 3.7.6
GPU: GeForce RTX 2070Ti (8G)
After I finished training, I tested the inference time using test dataset, and got <10ms per image (it would be slow for the first image, like about 30ms, because PyTorch model needs some warm up. After that, the inference time will be stable around 10 ms or less). Btw, there is only one face in each image.
But when I integrated the model into my software, the inference time is around 50ms which is much slower than my expectation. This is how I loaded the model in software:
I’m sure I moved my input and model to GPU and switched model to eval mode. And my software is based on Python.
Does anyone have idea about this issue?