How to do model parallelism together with data parallelism?

Hi, I want to combine the model parallelism and data parallelism in PyTorch.
For example, my model is too large, and has to be split into 2 GPUs with batch size = 1. In the same time, I want to fully utilize the 8 GPUs.
I know how to do model parallelism, also know how to do data parallelism. But how to do them in the same time?
Is there any example for my case?

1 Like