Per-parameter options: weight_decay and others

Behrooz_Sepehry · March 20, 2018, 6:08pm

Based on http://pytorch.org/docs/master/optim.html#per-parameter-options, we can have per parameter learning rate. I was wondering if we could do the same with weight_decay or other options as follows.

optim.SGD([
{‘params’: model.base.parameters()},
{‘params’: model.classifier.parameters(), ‘weight_decay’: 1e-3}
], lr=1e-2, momentum=0.9, weight_decay=0.0)

jpeg729 · March 20, 2018, 6:38pm

Yep.
All optimiser parameters are stored in the same way. lr, momentum and weight_decay, etc. can be specified separately for each group of parameters.