I saw some support for DAGs in torchserve docs: 19. TorchServe Workflows — PyTorch/Serve master documentation
I wonder if torchserve supports model pipelining? e.g. preprocessing and postprocessing should run on their own pools of CPU threads to minimize waiting time of the main GPU model thread
For that, preprocessing and postprocessing stages should somehow have their own input/output queues and their own pools of CPU threads (maybe even pinned to actually different CPU cores), and then preprocessing / main model and postprocessing should be connected in some sort of pipeline
Is this scenario supported by torchserve?
In the ideal world, even multi-node deployments like this are usefulю e.g. some video file decoding done one one powerful CPU machine which sends the results to the lightCPU+strongGPU machine. If pipelines/queues like this can be natively orchestrated by torchserve, it’d be really great