The following two pieces of code produce different training results and I don’t understand why. My understanding is that named_parameters() returns all model params (trainable or not). What is the difference between specifying the learning rate on each of them vs. in the SGD constructor?
Code #1:
def get_optimizer(self):
lr = opt.lr
params = []
for key, value in dict(self.named_parameters()).items():
if value.requires_grad:
if 'bias' in key:
params += [{'params': [value], 'lr': lr * 1, 'weight_decay': 0}]
else:
params += [{'params': [value], 'lr': lr, 'weight_decay': 0}]
self.optimizer = t.optim.SGD(params, momentum=0.9)
return self.optimizer