Using LR-Scheduler with param groups of different LR's

Alexander_Riedel · March 19, 2021, 3:11pm

Hey,
I have defined the following optimizer with different learning rates for each parameter group:

  optimizer = optim.SGD([
          {'params': param_groups[0], 'lr': CFG.lr, 'weight_decay': CFG.weight_decay},
          {'params': param_groups[1], 'lr': 2*CFG.lr, 'weight_decay': 0},
          {'params': param_groups[2], 'lr': 10*CFG.lr, 'weight_decay': CFG.weight_decay},
          {'params': param_groups[3], 'lr': 20*CFG.lr, 'weight_decay': 0},
      ], lr=CFG.lr, momentum=0.9, weight_decay=CFG.weight_decay, nesterov=CFG.nesterov)

Now I want to use a LR-Scheduler to update all the learning rates and not only the first one, because by deafult, a scheduler would only update the param_groups[0]?

scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=5, T_mult=2, eta_min=CFG.min_lr, last_epoch=-1, verbose=True)

Giving me:

Parameter Group 0
    dampening: 0
    initial_lr: 0.001
    lr: 0.0009999603905218616
    momentum: 0.9
    nesterov: True
    weight_decay: 0.0001

Parameter Group 1
    dampening: 0
    initial_lr: 0.002
    lr: 0.002
    momentum: 0.9
    nesterov: True
    weight_decay: 0

Parameter Group 2
    dampening: 0
    initial_lr: 0.01
    lr: 0.01
    momentum: 0.9
    nesterov: True
    weight_decay: 0.0001

Parameter Group 3
    dampening: 0
    initial_lr: 0.02
    lr: 0.02
    momentum: 0.9
    nesterov: True
    weight_decay: 0
)

after on update.

Any idea how to update all the learning rates with a scheduler?

Vignesh_Baskaran · July 7, 2021, 2:15pm

Hi @Alexander_Riedel, did you find a way to do this?

Alexander_Riedel · January 31, 2022, 12:32am

Didn’t work on it anymore but you can find some more hints here: python - PyTorch using LR-Scheduler with param groups of different LR's - Stack Overflow

Kirtikumar_Pandya · February 11, 2023, 9:37pm

@ptrblck Could you please look into this!!

Thanking you.

ptrblck · February 13, 2023, 8:13pm

I don’t think this statement:

is true, since the CosineAnnealingWarmRestarts would iterate all .param_groups as seen here. Let me know if I’m missing something.

Kirtikumar_Pandya · February 15, 2023, 9:13am

opt = AdamW([
            {'params':[*self.block.parameters()], 'lr':1e-3}, 
            {'params':self.bert.parameters(), 'lr':1e-4}
        ])
sch = CosineAnnealingLR(opt, T_max=len_train_df/(batch_size*n_epochs), verbose=True)

In this case it will update all parameter groups. I cannot set different schedulers for different param groups. Right!!!

ptrch_c_m · November 7, 2023, 2:45pm

Any update on this?

What I am trying to do is to use a scheduler only on the first dictionary passed to the definition of the optimizer, in other words, I want to use a scheduler only on param_group 0.

Is this possible?