How to enable DDP to train a structure-changing model?

My goal is to accelerate 3D-Gaussian-Splatting (3DGS) model training by using distributed tech.

While the DDP’s API is concise and elegant, I find it quite hard to move from MNIST DDP sample to enabling DDP on 3DGS. The biggest hurdle I think is this:

Unlike most cases, where the model’s hyperparams (structure) are fixed, 3DGS model’s equavelant parameters are its 3d-points, which needs to be densified & pruned during training. This is the reason I call it “structure-changing model” in the title.

It seems to me, what matters is how to correctly sync multiple diverged models on multiple gpus. Any advise to get me started with?