Are optimizer per-parameter options supported with DistributedDataParallel?

Hi All,

  1. Lets suppose I have a model that I want to train using DistributedDataParallel, I wrap my model with DistributedDataParallel as follows:
ddp_model = DDP(model, device_ids=[device])
  1. I init my optimizer as follows:
optim = optim.SGD(ddp_model.parameters(), lr=1e-2)

Is there a way to modify step 2, to apply per parameter optimizer options? What does the following look like given the ddp model?

optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

As on https://pytorch.org/docs/stable/optim.html#per-parameter-options

Thanks!

I believe per-parameter options should be supported by DistributedDataParallel. Have you tried it out and seen any issues? If you do see issues/unexpected behavior feel free to open an issue on github.