Understanding per layer learning rates with scheduler

Hey everybody,

I’m trying to use a LR-scheduler for transfer learning. My backbone (an efficientnet_v2) should also be tuned, but with lower learning rates than the classifier. AFAIK this could be done like this for at least a fixed learning rate:

LR = 1e-3

params = [
          {'params': model.conv1.parameters(), 'lr': LR / 10},
          {'params': model.bn1.parameters(), 'lr': LR / 10},
          {'params': model.layer1.parameters(), 'lr': LR / 8},
          {'params': model.layer2.parameters(), 'lr': LR / 6},
          {'params': model.layer3.parameters(), 'lr': LR / 4},
          {'params': model.layer4.parameters(), 'lr': LR / 2},
          {'params': model.fc.parameters()}

optimizer = optim.Adam(params, lr = FOUND_LR)

I now want to use a OneCycleLR, how do I get the scheduler to respect the defined LR fractions? How are scheduler and optimizer interacting with each other? Is the scheduler updating one (to me yet unknown) variable or does the scheduler update all the ‘lr’ fields of the parameter groups?

The learning rate schedulers will iterate all .param_groups of the optimizer as seen here.