Torchserve support for model parallel / model pipelining?

vadimkantorov · May 30, 2023, 7:58am

I saw some support for DAGs in torchserve docs: 19. TorchServe Workflows — PyTorch/Serve master documentation

I wonder if torchserve supports model pipelining? e.g. preprocessing and postprocessing should run on their own pools of CPU threads to minimize waiting time of the main GPU model thread

For that, preprocessing and postprocessing stages should somehow have their own input/output queues and their own pools of CPU threads (maybe even pinned to actually different CPU cores), and then preprocessing / main model and postprocessing should be connected in some sort of pipeline

Is this scenario supported by torchserve?

In the ideal world, even multi-node deployments like this are usefulю e.g. some video file decoding done one one powerful CPU machine which sends the results to the lightCPU+strongGPU machine. If pipelines/queues like this can be natively orchestrated by torchserve, it’d be really great

marksaroufim · June 4, 2023, 1:48am

It’s something we’ve talked about quite a bit since preprocessing is often the real bottleneck we’ve found in user code - this is an RFC, feel free to ping me and Matthias on the thread and we can chat more about this [RFC] Composable handler by mreso · Pull Request #2189 · pytorch/serve · GitHub

vadimkantorov · June 9, 2023, 4:31pm

Hmm, as an outsider - this RFC does not make it clear if these handle parallelization/separate queues/threads of these preproc and postproc and how to handle scheduling / CPU pinning / inter-node pipeline communication

By pipelining I mean something like the instruction handling pipeline that is found in CPUs: separating some parts of the data processing flow into queues and handling these queues on different sets of CPU cores