Torch serve: dynamic batching?

johann-petrak · June 19, 2021, 9:54pm

I have been unable to figure out if torch serve supports dynamic batching and if yes how:

I have some model where throughput could be optimized if we always run batchsize > 1 intances through the model at once.

So it would be cool if torchserve can collect requests that are received within a certain amount of time and group them into batches for processing.

This should be something that can be configured through the torch serve config file or through command line options when starting, not through the torchserve API itself, as it should be set up when the torch serve serve is started with the initial model.

But I cannot find any documentation about how to do this.

Is this supported and if yes, how?