Pipeline batches during inference

adizhol · January 27, 2020, 11:32am

Hi,
Google published a paper where they “pipeline” the inference (and backprop) stage (each later at a time).

I thought that most frameworks do this during the forward pass when inferencing, since you don’t need to wait for the batch to finish calculating all the outputs before loading a new batch.
If that’s not the case, is any work on this planned for pytorch?

Thanks!

marksaroufim · February 3, 2022, 8:01pm

It’s here Pipeline Parallelism — PyTorch 2.1 documentation