I wrote an API to serve my PyTorch Object Detection Model. I followed the Tutorial provided on the PyTorch website. Th aim is to send the API an image and to get back the result (classes, bounding boxes, etc.). For the prediction of only one image at time it works like a charm.
Now I want to extend it. The aim is to send multiple images to the API and to get back the results (in my case 5 images). I want to get the results as fast as possible. In a Tensorflow Tutorial I read that batching is a way to increase the performance. Is it better to do the predictions sequentially or to return the predictions for multiple images at once (batching)? What is the better and faster method? I run the inference on a GPU.