Are optimizer per-parameter options supported with DistributedDataParallel?

nalapati · January 24, 2020, 12:31am

Hi All,

Lets suppose I have a model that I want to train using DistributedDataParallel, I wrap my model with DistributedDataParallel as follows:

ddp_model = DDP(model, device_ids=[device])

I init my optimizer as follows:

optim = optim.SGD(ddp_model.parameters(), lr=1e-2)

Is there a way to modify step 2, to apply per parameter optimizer options? What does the following look like given the ddp model?

optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

As on https://pytorch.org/docs/stable/optim.html#per-parameter-options

Thanks!

pritamdamania87 · January 25, 2020, 1:06am

I believe per-parameter options should be supported by DistributedDataParallel. Have you tried it out and seen any issues? If you do see issues/unexpected behavior feel free to open an issue on github.