Pipeline batches during inference

Hi,
Google published a paper where they “pipeline” the inference (and backprop) stage (each later at a time).


I thought that most frameworks do this during the forward pass when inferencing, since you don’t need to wait for the batch to finish calculating all the outputs before loading a new batch.
If that’s not the case, is any work on this planned for pytorch?

Thanks!

It’s here Pipeline Parallelism — PyTorch 2.1 documentation