Why layer-specific learning rate needs to be set in optimizer, not in model definition

I want to use specific learning rate for each layer. In caffe and lua torch, this can be done during constructing a model file. But in pytorch, it seems that specific learning rate has to be set using optimizer.

For example, to set specific learning rate for STN:

lr_policy = list()

for k, v in dict(model.named_parameters()).items():
    if 'stn' in k:
        lr_policy.append({'params': v, 'lr': 0.01, 'weight_decay': args.weight_decay})
    else:
        lr_policy.append({'params': v})

optimizer = torch.optim.SGD(lr_policy,
                            args.lr,
                            momentum=args.momentum,
                            weight_decay=args.weight_decay)

While the above code works, but it’s not general and convenient. I think learning rate policy should be related to the network architecture, not related to the optimizer. When the network is changed, learning rate policy might also be changed with it.

For example, this main.py and the models file. They are defined in independent source files. If using specific learning rate for different models, the optimizer in main.py shouldn’t be touched.