PyTorch inference with multiple images

I wrote an API to serve my PyTorch Object Detection Model. I followed the Tutorial provided on the PyTorch website. Th aim is to send the API an image and to get back the result (classes, bounding boxes, etc.). For the prediction of only one image at time it works like a charm.

Now I want to extend it. The aim is to send multiple images to the API and to get back the results (in my case 5 images). I want to get the results as fast as possible. In a Tensorflow Tutorial I read that batching is a way to increase the performance. Is it better to do the predictions sequentially or to return the predictions for multiple images at once (batching)? What is the better and faster method? I run the inference on a GPU.

In my case, I used python concurrent.futures.ProcessPoolExecutor and concurrent.futures.ThreadPoolExecutor to inference 4 images at once.

But, this one might help you.

@1chimaruGin thanks for your reply. I will try your first suggestion. I also found the second entry, but it didn’t really help to reduce the inference time.

Then, use TensorRT. It might help you a lot.