DDP multiple models, share subset of common params

Hello all,

I would like to train multiple models, each on a separate GPU, using DDP. The challenge is that each model has some unique params (for each model), and some common params which are shared by all models. Any tips on how this can be done? Thanks very much in advance.

Hi @Blake_Camp Thanks for posting the question. You can try mixing DDP and RPC together, use RPC to hold a shared part of your model in one rank, and use DDP for the remaining part. see Combining Distributed DataParallel with Distributed RPC Framework — PyTorch Tutorials 1.9.1+cu102 documentation

1 Like

Also, if you could provide more details on your use case, it should be helpful for us to see if there’re existing solutions :slight_smile:

1 Like