I would like to train multiple models, each on a separate GPU, using DDP. The challenge is that each model has some unique params (for each model), and some common params which are shared by all models. Any tips on how this can be done? Thanks very much in advance.