Cyclic Learning Rate Max LR

Can the Max_LR value for CycleLR be less than the lr that is used for optim.Adam ?
I used the values below (lr 0.002 and max_lr 0.0005 (I did that by mistake) and that seemed to be giving better results than Max_LR of 0.005 )
Max_LR of 0.005 did not seem to give better result

optimizer = optim.Adam(list(net.parameters()),lr=0.002)

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.0005, steps_per_epoch=len(dataloaders[‘train’]), epochs=total_epocs_var)

Yes, you could set the max_lr to a smaller value than the original learning rate, which would create the new limit as seen here:

net = nn.Linear(1, 1)
optimizer = torch.optim.Adam(list(net.parameters()),lr=0.002)
scheduler = torch.optim.lr_scheduler.OneCycleLR(
    optimizer, max_lr=0.0005, steps_per_epoch=2, epochs=10)

for epoch in range(20):
    optimizer.step()
    scheduler.step()
    print(scheduler.get_last_lr())

> [6.58359213500126e-05]
[0.00018583592135001264]
[0.00033416407864998737]
[0.00045416407864998736]
[0.0005]
[0.0004937320031175437]
[0.0004752423160067368]
[0.00044545808878552497]
[0.0004058728269748815]
[0.0003584715008956504]
[0.0003056310109681446]
[0.00025000099999999997]
[0.00019437098903185538]
[0.0001415304991043496]
[9.412917302511849e-05]
[5.454391121447504e-05]
[2.4759683993263143e-05]
[6.269996882456277e-06]
[2e-09]
[6.269996882456277e-06]
1 Like

Thanks @ptrblck for your kind reply
I will experiment with the code block that you have shared and see the difference if lr was also small like 0.0005 vesus a large lr of 0.002
I guess that lr might start with larger value and then settle into the max value set by max_lr

What could be the advantages of such a setting ? (or any draw backs too of such setting)

One drawback could be a slightly confusing code, as the optimizer defines another learning rate than the scheduler is allowing. Besides that the scheduler will drive the learning rate, so I don’t think there are other issues.