Based on http://pytorch.org/docs/master/optim.html#per-parameter-options, we can have per parameter learning rate. I was wondering if we could do the same with weight_decay or other options as follows.
optim.SGD([
{‘params’: model.base.parameters()},
{‘params’: model.classifier.parameters(), ‘weight_decay’: 1e-3}
], lr=1e-2, momentum=0.9, weight_decay=0.0)