I want to learn how to enable asynchronous batch inference. As opposed to the common way that samples in a batch are computed (forward) at the same time synchronously within a process, I want to know how to compute (forward) each sample asynchronously in a batch using different processes because my model and data are too special to handle in a process synchronously (e.g., sample lengths in a batch vary too much)
My current idea is using torch.multiprocessing as one solution. However, I don’t know if it is the appropriate way. And is there any other solutions?
Hi Can, thanks for your reply. Yes, but my case is more complicated. For example, my model is an enc-dec model. I hope samples in a batch are encoded synchronously but to decode asynchronously. I didn’t find TorchServe can do kind of this and don’t think a framework can support it without any customization work.