I am trying to reduce the model inference time/computation time in pytorch by setting number of threads to the max available threads. But it is not helping with inference time reduction, it have increased the overall inference time. By default the num of threads are half the available cores but it is faster compared to setting to max threads. Can anyone explain why its happening and suggest some of the ways to optimize inference time on CPU.
Things I have tried:
Setting num_threads to max cpu cores available
Try using python multiprocessing library to do parallel inference on multiple images
Tried Batching the images
What other ways can I try to reduce the inference time.