Any plans on distributed training for large models?

Hi team,

I have noticed the PyTorch new feature DistributedTensor, it provides a new way to build a large model. And the new feature TorchDyname of PyTorch 2.0 provides a new to capture graphs. So are there any plans for PyTorch to capture the whole graph of a single GPU and automatically translate it to distributed mode with model parallel and pipeline parallel based on the GPU resources?

Thanks, any response will be appreciated.

cc @wanchaol @aazzolini

@weberxie Thanks for the question! Yes we are actively exploring and building out the compiler based distributed training stack with DTensor and PyTorch 2.0. We will release it as prototype to use once we finish the building blocks.

1 Like