CPU Parallelism for Cumulative Loops

rvarm1 · November 17, 2021, 1:39am

This use case doesn’t exactly need distributed training, but you can try torch.jit.fork which is the main way to do task-level parallelism in PyTorch: torch.jit.fork — PyTorch 1.10.0 documentation.