How to implement torch.optim.lr_scheduler.CosineAnnealingLR?

Gautam_Venkatraman · November 5, 2018, 9:02am

Hi,

I am trying to implement SGDR in my training but I am not sure how to implement it in PyTorch.

I want the learning rate to reset every epoch.

Here is my code:

model = ConvolutionalAutoEncoder().to(device)
# model = nn.DataParallel(model)
# Loss and optimizer
learning_rate = 0.1
weight_decay = 0.005
momentum = 0.9
# criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay, momentum=momentum)
# optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
scheduler = lr_scheduler.CosineAnnealingLR(optimizer, len(train_loader), eta_min=learning_rate)

params = list(model.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

num_epochs = 30
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, data in enumerate(train_loader):
#         data = Variable(data, requires_grad=True)
#         print(data)
        inp, targ = data
        inp = inp.to(device)
        targ = targ.to(device)
#         inp = Variable(inp, requires_grad=True).to(device)
#         targ = Variable(targ).to(device)

        output = model(inp)
#         scheduler.zero_grad()
        loss = F.binary_cross_entropy(output, targ)

        loss.backward()
        scheduler.step()
        
        if i % 50 == 0:
            for param_group in optimizer.param_groups:
                print("Current learning rate is: {}".format(param_group['lr']))
            print("Epoch[{}/{}]({}/{}): Loss: {:.4f}".format(epoch+1,num_epochs, i, len(train_loader), loss.item()))

But I’m not seeing any change in the learning rate. Please help.

Thanks

ptrblck · November 5, 2018, 11:22am

Since you are setting eta_min to the initial learning rate, your scheduler won’t be able to change the learning rate at all.
Set it to a low value or keep the default value of 0.

Also, the scheduler will just manipulate the learning rate. It won’t update your model.
Therefore you should call scheduler.step() in the epoch loop at the beginning of each epoch, and keep optimizer.step() in the DataLoader loop.

Gautam_Venkatraman · November 5, 2018, 12:07pm

Hi,

Thanks for the quick reply. I tried as you suggested but got a gradient explosion (see below).

Current learning rate is: 0.001
Epoch[1/30](0/69): Loss: 0.7209
Current learning rate is: 0.001
Epoch[1/30](50/69): Loss: 0.7745
Current learning rate is: 0.0009997718922447668
Epoch[2/30](0/69): Loss: 0.6944
Current learning rate is: 0.0009997718922447668
Epoch[2/30](50/69): Loss: 3.4404
Current learning rate is: 0.0009990877771116587
Epoch[3/30](0/69): Loss: 4.8132
Current learning rate is: 0.0009990877771116587
Epoch[3/30](50/69): Loss: 123.8725
Current learning rate is: 0.0009979482788085455
Epoch[4/30](0/69): Loss: 94.6003
Current learning rate is: 0.0009979482788085455
Epoch[4/30](50/69): Loss: 31125.7598
Current learning rate is: 0.000996354437049027
Epoch[5/30](0/69): Loss: 81586.2578
Current learning rate is: 0.000996354437049027
Epoch[5/30](50/69): Loss: 1193706.2500
Current learning rate is: 0.000994307706103767
Epoch[6/30](0/69): Loss: 1318053.0000
Current learning rate is: 0.000994307706103767
Epoch[6/30](50/69): Loss: 386166432.0000
Current learning rate is: 0.0009918099534735718
Epoch[7/30](0/69): Loss: 465092672.0000
Current learning rate is: 0.0009918099534735718
Epoch[7/30](50/69): Loss: 40667889664.0000
Current learning rate is: 0.0009888634581854234
Epoch[8/30](0/69): Loss: 42276134912.0000

Also, I want to reset learning rate every epoch but here its reducing every epoch.

ptrblck · November 5, 2018, 12:18pm

Are you calling optimizer.zero_grad() somewhere in your training loop?
It looks like the gradients are being accumulated.

Gautam_Venkatraman · November 5, 2018, 12:23pm

oh…right…missed that…thanks

However, can I implement the SGDR with warm restarts in PyTorch?
Right now, it does not seem to be restarting. I want the cosine annealing to happen inside every epoch and then restart for the next epoch.

ptrblck · November 5, 2018, 12:27pm

Would you like to lower the learning rate to its minimum in each epoch and then restart from the base learning rate?
If so, you could try the following code:

model = nn.Linear(10, 2)
optimizer = optim.SGD(model.parameters(), lr=1.)
steps = 10
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)

for epoch in range(5):
    for idx in range(steps):
        scheduler.step()
        print(scheduler.get_lr())
    
    print('Reset scheduler')
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)

Note that the steps loop is basically your DataLoader loop.

Gautam_Venkatraman · November 5, 2018, 12:33pm

Thanks a lot!!! Working perfectly now!!!

jia_lee · March 12, 2019, 9:42am

Hi~ Why scheduler.get_lr()[0] changes after we do shceduler.step(), but optimizer.param_groups[0]['lr'] never changes in the loop? Am I missing sth? Hope for your help, thank you!

ptrblck · March 12, 2019, 12:36pm

What does print(optimizer.param_groups[0]['lr'] show? Is the value constant?
Could you post the code how you’ve initialized your lr_scheduler?

jia_lee · March 12, 2019, 3:09pm

Thank you for your reply. My code is like this:

optimizer = optim.SGD(posenet.parameters(), lr=opt.learning_rate, momentum=0.9, weight_decay=1e-4)
checkpoint = torch.load(opt.ckpt_path)  
posenet.load_state_dict(checkpoint['weights'])
optimizer.load_state_dict(checkpoint['optimizer_weight'])
print('Optimizer has been resumed from checkpoint...')


scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2, last_epoch=-1) 

for i in range(start_epoch):
    #  update the learning rate for start_epoch times
    scheduler.step()   


def train(epoch):
    print('\n ############################# Train phase, Epoch: {} #############################'.format(epoch))
    posenet.train()
    train_loss = 0
    scheduler.step()
    print('\nLearning rate at this epoch is: %0.9f' % scheduler.get_lr()[0])  # changes every epoch
    # print('\nLearning rate at this epoch is: ', optimizer.param_groups[0]['lr'], '\n')  # Never changes

    for batch_idx, target_tuple in enumerate(train_loader):
          do sth.....

jia_lee · March 12, 2019, 3:11pm

I haven’t figured it out yer. Could you please help me. Thx!

ptrblck · March 12, 2019, 10:03pm

Your code looks alright. Do you see a constant learning rate or another issue?

jia_lee · March 13, 2019, 1:05am

Ah, it behaves normal now… The scheduler.get_lr()[0] and optimizer.param_groups[0]['lr'] output equally. Thank you very much, ptrblck, you have helped me for several times! Best wishes for you.

wk910930 · March 22, 2019, 11:43am

Hi, Jia_lee, I met the same issue. scheduler.get_lr()[0] changes every epoch, but optimizer.param_groups[0]['lr'] never changes. How did you fix this? Thank you.

jia_lee · March 24, 2019, 12:36pm

In my case, it seems that the optimizer checkpoint influences the behavior of my learning rate scheduler.

wk910930 · March 25, 2019, 1:48am

Thanks for the reply. In my case, it turned out that using %.3f to print only the first 3 digits of the learning rate is not enough to see the changes for optim.lr_scheduler.CosineAnnealingLR(), especially when you have a large epoch number. Using %.6f or in scientific notation should work.

k0pch4 · March 29, 2019, 5:43pm

Hi @ptrblck , does scheduler.step() change the lr corresponding to the params that were passed to optimizer? If it does how are we supposed to restart then?

The above question is because I read the following lines in the source code, thus questioning if the state of optimizer is being changed by the function call scheduler.step()

for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()):
            param_group['lr'] = lr

Thanks!

ptrblck · March 29, 2019, 9:37pm

Yes, the learning rates of each param_group of the optimizer will be changed.
If you want to reset the learning rate, you could use the same code and re-create the scheduler:

# Reset lr
for param_group in optimizer.param_groups:
    param_group['lr'] = init_lr

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1, last_epoch=-1)

Sriharsha-hatwar · April 6, 2020, 6:22pm

Hi @ptrblck I don’t think this is going to work now, I heard that we need to make sure the optimizer is also fed with this new learning rate if it needs to work, are there any documentation as to how we can achieve this in pytorch? the way in keras seems to be very straightforward.

ptrblck · April 6, 2020, 7:54pm

Could you link to the discussion, so that I can have a look?
In my example the learning rate of all param_groups will be reset, but I didn’t verify the code on the current master.