Asynchronous batch inference

zenosama · February 22, 2022, 7:31am

Hello,

I want to learn how to enable asynchronous batch inference. As opposed to the common way that samples in a batch are computed (forward) at the same time synchronously within a process, I want to know how to compute (forward) each sample asynchronously in a batch using different processes because my model and data are too special to handle in a process synchronously (e.g., sample lengths in a batch vary too much)

My current idea is using torch.multiprocessing as one solution. However, I don’t know if it is the appropriate way. And is there any other solutions?

Thanks

cbalioglu · February 22, 2022, 1:56pm

Have you checked TorchServe?

zenosama · February 23, 2022, 2:31am

Hi Can, thanks for your reply. Yes, but my case is more complicated. For example, my model is an enc-dec model. I hope samples in a batch are encoded synchronously but to decode asynchronously. I didn’t find TorchServe can do kind of this and don’t think a framework can support it without any customization work.

marksaroufim · February 23, 2022, 2:36am

Hi @zenosama we should be able to handle this with our workflows feature - let me know if this helps serve/examples/Workflows at master · pytorch/serve · GitHub

zenosama · February 23, 2022, 4:22am

Thanks Mark for your help. Will definitely look into the examples to see how to work in my case.

Thank you