Confusion with LR Scheduler get_lr()

test = torch.autograd.Variable(torch.randn([5,5]), requires_grad=True)
optimizer = torch.optim.Adam([test], lr = 0.0001)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=100, gamma=0.1)
lr_scheduler.get_lr()

This code snippet gives me a learning rate of 0.001

I want to start the optimizer with LR = 1e-4, and schedule a decay of 0.1 every 100 epochs.
It seems like the LR Scheduler starts me off at 1e-3 instead.
Can someone clarify?

This happens on Torch 0.3.1 and 0.4.0.

It’s a bit strange to see the initial learning rate seems to be too high, but it’s consistent with the example in the docs:

scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(100): 
    scheduler.step() 
    train(...) 
    validate(...)

As you call scheduler first, the learning rate will be set to 1e-4 for the next step_size epochs.

test = torch.autograd.Variable(torch.randn([5,5]), requires_grad=True)
optimizer = torch.optim.Adam([test], lr = 0.0001)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=100, gamma=0.1)
lr_scheduler.step()
lr_scheduler.get_lr()

Yeah. The behaviour is weird. If I call step() before get_lr(), the output looks correct.

I am also finding the same issue while using lr scheduler. I want to start it from 1e-05 but it starts from 1e-03.
Another thing I found out that, if I increase the step size, then it will starts from 1e-04
Any one else got this issue. How they Solved it ?

Could you post a minimal code snippet to reproduce your issue?

Training Settings

base_lr = 0.000001 # Hyper parameters
max_lr = 1 # Hyper parameters

def main():
print(‘…Main Function starts…’)
optimizer = optim.SGD(model.parameters(), lr=base_lr, momentum=momentum)
scheduler = optim.lr_scheduler.CyclicLR(optimizer=optimizer,base_lr=base_lr,max_lr=max_lr,step_size_up=100,step_size_down=100)

Training function

def train(model, optimizer, scheduler,plot_loss,plot_lr, training_dataloader, batch_size, grids, grids_values, bounding_boxes, classes, lambda_coord, lambda_noobject):
model.train()
batch_loss = 0
for batch_index, batched_sample in enumerate(training_dataloader):
print(‘Current LR : {}’.format(scheduler.get_lr()))
batched_image = torch.tensor(batched_sample[‘image’], requires_grad=True, dtype=torch.float)
batched_label = torch.tensor(batched_sample[‘label’], requires_grad=True, dtype=torch.float)
batched_output = model(batched_image)
batched_output = batched_output.view(batch_size, grids, grids_values) # Convert the output size into [N X GRIDS(225 X 225) X (5 * B + C)]
loss = yolo_loss(batched_output, batched_label, grids, bounding_boxes, classes, lambda_coord, lambda_noobject)
batch_loss = batch_loss + loss.item()
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()

For this settings, the output is :

Changing the setting to step_size_up = 500 and nothing else, output is :

  1. It is not starting from the base_lr
  2. By changing the step size, why the lr scheduler is starting from different base_lr.
  3. To work around this issue, I have to go through 2 cycles - one from 1-e06 to 1-e04 and 1-e04 to1. Is there any range issue with lr scheduler
  4. Is there any bug or am I implementing it in the wrong way ?

I might be misunderstanding the output, but the learning rate is starting from the base_lr=1e-6 in both screenshots.

Correct me, If I am wrong. It is showing at the first print base_lr = 1e-06, when the lr scheduler is not called. After calling lr scheduler, it jumps to 0.01 and scaling with (+0.01). Rather it should starts from 1e-06 after calling lr scheduler and should scale accordingly.

pytorchdisscus3