Pipeline batches during inference

Google published a paper where they “pipeline” the inference (and backprop) stage (each later at a time).

I thought that most frameworks do this during the forward pass when inferencing, since you don’t need to wait for the batch to finish calculating all the outputs before loading a new batch.
If that’s not the case, is any work on this planned for pytorch?


It’s here Pipeline Parallelism — PyTorch 1.10 documentation