Hi All,
- Lets suppose I have a model that I want to train using DistributedDataParallel, I wrap my model with DistributedDataParallel as follows:
ddp_model = DDP(model, device_ids=[device])
- I init my optimizer as follows:
optim = optim.SGD(ddp_model.parameters(), lr=1e-2)
Is there a way to modify step 2, to apply per parameter optimizer options? What does the following look like given the ddp model?
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
As on https://pytorch.org/docs/stable/optim.html#per-parameter-options
Thanks!