DDP + Model Parallelism Tutorial

I have an existing DDP pipeline that can fit on my single-node multiple GPU system. But when I increase the size of my model, the forward pass doesn’t fit on one GPU. Is there any tutorial regarding implementation of DDP + Model Parallelism together?

@QasimKhan5x thanks for posting, we recently release FSDP (FullyShardedDataParallel) to deal with the case that the model size couldn’t fit into one GPU, you can take a look at this doc page and try it out Getting Started with Fully Sharded Data Parallel(FSDP) — PyTorch Tutorials 1.12.1+cu102 documentation

1 Like