How to implement torch.optim.lr_scheduler.CosineAnnealingLR?


I am trying to implement SGDR in my training but I am not sure how to implement it in PyTorch.

I want the learning rate to reset every epoch.

Here is my code:

model = ConvolutionalAutoEncoder().to(device)
# model = nn.DataParallel(model)
# Loss and optimizer
learning_rate = 0.1
weight_decay = 0.005
momentum = 0.9
# criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay, momentum=momentum)
# optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
scheduler = lr_scheduler.CosineAnnealingLR(optimizer, len(train_loader), eta_min=learning_rate)

params = list(model.parameters())
print(params[0].size())  # conv1's .weight

num_epochs = 30
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, data in enumerate(train_loader):
#         data = Variable(data, requires_grad=True)
#         print(data)
        inp, targ = data
        inp =
        targ =
#         inp = Variable(inp, requires_grad=True).to(device)
#         targ = Variable(targ).to(device)

        output = model(inp)
#         scheduler.zero_grad()
        loss = F.binary_cross_entropy(output, targ)

        if i % 50 == 0:
            for param_group in optimizer.param_groups:
                print("Current learning rate is: {}".format(param_group['lr']))
            print("Epoch[{}/{}]({}/{}): Loss: {:.4f}".format(epoch+1,num_epochs, i, len(train_loader), loss.item()))

But I’m not seeing any change in the learning rate. Please help.


1 Like

Since you are setting eta_min to the initial learning rate, your scheduler won’t be able to change the learning rate at all.
Set it to a low value or keep the default value of 0.

Also, the scheduler will just manipulate the learning rate. It won’t update your model.
Therefore you should call scheduler.step() in the epoch loop at the beginning of each epoch, and keep optimizer.step() in the DataLoader loop.



Thanks for the quick reply. I tried as you suggested but got a gradient explosion (see below).

Current learning rate is: 0.001
Epoch[1/30](0/69): Loss: 0.7209
Current learning rate is: 0.001
Epoch[1/30](50/69): Loss: 0.7745
Current learning rate is: 0.0009997718922447668
Epoch[2/30](0/69): Loss: 0.6944
Current learning rate is: 0.0009997718922447668
Epoch[2/30](50/69): Loss: 3.4404
Current learning rate is: 0.0009990877771116587
Epoch[3/30](0/69): Loss: 4.8132
Current learning rate is: 0.0009990877771116587
Epoch[3/30](50/69): Loss: 123.8725
Current learning rate is: 0.0009979482788085455
Epoch[4/30](0/69): Loss: 94.6003
Current learning rate is: 0.0009979482788085455
Epoch[4/30](50/69): Loss: 31125.7598
Current learning rate is: 0.000996354437049027
Epoch[5/30](0/69): Loss: 81586.2578
Current learning rate is: 0.000996354437049027
Epoch[5/30](50/69): Loss: 1193706.2500
Current learning rate is: 0.000994307706103767
Epoch[6/30](0/69): Loss: 1318053.0000
Current learning rate is: 0.000994307706103767
Epoch[6/30](50/69): Loss: 386166432.0000
Current learning rate is: 0.0009918099534735718
Epoch[7/30](0/69): Loss: 465092672.0000
Current learning rate is: 0.0009918099534735718
Epoch[7/30](50/69): Loss: 40667889664.0000
Current learning rate is: 0.0009888634581854234
Epoch[8/30](0/69): Loss: 42276134912.0000

Also, I want to reset learning rate every epoch but here its reducing every epoch.

Are you calling optimizer.zero_grad() somewhere in your training loop?
It looks like the gradients are being accumulated.


oh…right…missed that…thanks

However, can I implement the SGDR with warm restarts in PyTorch?
Right now, it does not seem to be restarting. I want the cosine annealing to happen inside every epoch and then restart for the next epoch.

Would you like to lower the learning rate to its minimum in each epoch and then restart from the base learning rate?
If so, you could try the following code:

model = nn.Linear(10, 2)
optimizer = optim.SGD(model.parameters(), lr=1.)
steps = 10
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)

for epoch in range(5):
    for idx in range(steps):
    print('Reset scheduler')
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)

Note that the steps loop is basically your DataLoader loop.


Thanks a lot!!! Working perfectly now!!!

1 Like

Hi~ Why scheduler.get_lr()[0] changes after we do shceduler.step(), but optimizer.param_groups[0]['lr'] never changes in the loop? Am I missing sth? Hope for your help, thank you!

What does print(optimizer.param_groups[0]['lr'] show? Is the value constant?
Could you post the code how you’ve initialized your lr_scheduler?

Thank you for your reply. My code is like this:

optimizer = optim.SGD(posenet.parameters(), lr=opt.learning_rate, momentum=0.9, weight_decay=1e-4)
checkpoint = torch.load(opt.ckpt_path)  
print('Optimizer has been resumed from checkpoint...')

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2, last_epoch=-1) 

for i in range(start_epoch):
    #  update the learning rate for start_epoch times

def train(epoch):
    print('\n ############################# Train phase, Epoch: {} #############################'.format(epoch))
    train_loss = 0
    print('\nLearning rate at this epoch is: %0.9f' % scheduler.get_lr()[0])  # changes every epoch
    # print('\nLearning rate at this epoch is: ', optimizer.param_groups[0]['lr'], '\n')  # Never changes

    for batch_idx, target_tuple in enumerate(train_loader):
          do sth.....

I haven’t figured it out yer. Could you please help me. Thx!

Your code looks alright. Do you see a constant learning rate or another issue?

1 Like

Ah, it behaves normal now… The scheduler.get_lr()[0] and optimizer.param_groups[0]['lr'] output equally. Thank you very much, ptrblck, you have helped me for several times! Best wishes for you.

1 Like

Hi, Jia_lee, I met the same issue. scheduler.get_lr()[0] changes every epoch, but optimizer.param_groups[0]['lr'] never changes. How did you fix this? Thank you.

In my case, it seems that the optimizer checkpoint influences the behavior of my learning rate scheduler.

Thanks for the reply. In my case, it turned out that using %.3f to print only the first 3 digits of the learning rate is not enough to see the changes for optim.lr_scheduler.CosineAnnealingLR(), especially when you have a large epoch number. Using %.6f or in scientific notation should work.

Hi @ptrblck , does scheduler.step() change the lr corresponding to the params that were passed to optimizer? If it does how are we supposed to restart then?

The above question is because I read the following lines in the source code, thus questioning if the state of optimizer is being changed by the function call scheduler.step()

for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()):
            param_group['lr'] = lr


Yes, the learning rates of each param_group of the optimizer will be changed.
If you want to reset the learning rate, you could use the same code and re-create the scheduler:

# Reset lr
for param_group in optimizer.param_groups:
    param_group['lr'] = init_lr

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1, last_epoch=-1)

Hi @ptrblck I don’t think this is going to work now, I heard that we need to make sure the optimizer is also fed with this new learning rate if it needs to work, are there any documentation as to how we can achieve this in pytorch? the way in keras seems to be very straightforward.

Could you link to the discussion, so that I can have a look?
In my example the learning rate of all param_groups will be reset, but I didn’t verify the code on the current master.