Asynchronous batch inference


I want to learn how to enable asynchronous batch inference. As opposed to the common way that samples in a batch are computed (forward) at the same time synchronously within a process, I want to know how to compute (forward) each sample asynchronously in a batch using different processes because my model and data are too special to handle in a process synchronously (e.g., sample lengths in a batch vary too much)

My current idea is using torch.multiprocessing as one solution. However, I don’t know if it is the appropriate way. And is there any other solutions?


Have you checked TorchServe?

Hi Can, thanks for your reply. Yes, but my case is more complicated. For example, my model is an enc-dec model. I hope samples in a batch are encoded synchronously but to decode asynchronously. I didn’t find TorchServe can do kind of this and don’t think a framework can support it without any customization work.

Hi @zenosama we should be able to handle this with our workflows feature - let me know if this helps serve/examples/Workflows at master · pytorch/serve · GitHub

Thanks Mark for your help. Will definitely look into the examples to see how to work in my case.

Thank you