Parallel inference in Pytorch

When I perform deep learning inference in PyTorch, some models have branch structures in their architecture. In this case, does PyTorch consider dispatching the operators on these branches to different streams for parallel execution? Based on my testing results, it seems that there is no such mechanism. So I want to know why this is the case, or how I can achieve this parallel inference.
Thank you so much!

2 Likes

AFAIK Python doesn’t have a mechanism for branch prediction because in python and pytorch code is executed eagerly as in line by line

So to make code run faster you can use a jit like torch.jit.script or torch.compile which will specialize on or another branch and there’s some active discussions about explicitly adding control flow ops in torch Add support for dynamic control flow in torch.fx · Issue #99598 · pytorch/pytorch · GitHub