How to implement torch.optim.lr_scheduler.CosineAnnealingLR?


(Gautam Venkatraman) #1

Hi,

I am trying to implement SGDR in my training but I am not sure how to implement it in PyTorch.

I want the learning rate to reset every epoch.

Here is my code:

model = ConvolutionalAutoEncoder().to(device)
# model = nn.DataParallel(model)
# Loss and optimizer
learning_rate = 0.1
weight_decay = 0.005
momentum = 0.9
# criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay, momentum=momentum)
# optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
scheduler = lr_scheduler.CosineAnnealingLR(optimizer, len(train_loader), eta_min=learning_rate)

params = list(model.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

num_epochs = 30
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, data in enumerate(train_loader):
#         data = Variable(data, requires_grad=True)
#         print(data)
        inp, targ = data
        inp = inp.to(device)
        targ = targ.to(device)
#         inp = Variable(inp, requires_grad=True).to(device)
#         targ = Variable(targ).to(device)

        output = model(inp)
#         scheduler.zero_grad()
        loss = F.binary_cross_entropy(output, targ)

        loss.backward()
        scheduler.step()
        
        if i % 50 == 0:
            for param_group in optimizer.param_groups:
                print("Current learning rate is: {}".format(param_group['lr']))
            print("Epoch[{}/{}]({}/{}): Loss: {:.4f}".format(epoch+1,num_epochs, i, len(train_loader), loss.item()))
    

But I’m not seeing any change in the learning rate. Please help.

Thanks


#2

Since you are setting eta_min to the initial learning rate, your scheduler won’t be able to change the learning rate at all.
Set it to a low value or keep the default value of 0.

Also, the scheduler will just manipulate the learning rate. It won’t update your model.
Therefore you should call scheduler.step() in the epoch loop at the beginning of each epoch, and keep optimizer.step() in the DataLoader loop.


(Gautam Venkatraman) #3

Hi,

Thanks for the quick reply. I tried as you suggested but got a gradient explosion (see below).

Current learning rate is: 0.001
Epoch[1/30](0/69): Loss: 0.7209
Current learning rate is: 0.001
Epoch[1/30](50/69): Loss: 0.7745
Current learning rate is: 0.0009997718922447668
Epoch[2/30](0/69): Loss: 0.6944
Current learning rate is: 0.0009997718922447668
Epoch[2/30](50/69): Loss: 3.4404
Current learning rate is: 0.0009990877771116587
Epoch[3/30](0/69): Loss: 4.8132
Current learning rate is: 0.0009990877771116587
Epoch[3/30](50/69): Loss: 123.8725
Current learning rate is: 0.0009979482788085455
Epoch[4/30](0/69): Loss: 94.6003
Current learning rate is: 0.0009979482788085455
Epoch[4/30](50/69): Loss: 31125.7598
Current learning rate is: 0.000996354437049027
Epoch[5/30](0/69): Loss: 81586.2578
Current learning rate is: 0.000996354437049027
Epoch[5/30](50/69): Loss: 1193706.2500
Current learning rate is: 0.000994307706103767
Epoch[6/30](0/69): Loss: 1318053.0000
Current learning rate is: 0.000994307706103767
Epoch[6/30](50/69): Loss: 386166432.0000
Current learning rate is: 0.0009918099534735718
Epoch[7/30](0/69): Loss: 465092672.0000
Current learning rate is: 0.0009918099534735718
Epoch[7/30](50/69): Loss: 40667889664.0000
Current learning rate is: 0.0009888634581854234
Epoch[8/30](0/69): Loss: 42276134912.0000

Also, I want to reset learning rate every epoch but here its reducing every epoch.


#4

Are you calling optimizer.zero_grad() somewhere in your training loop?
It looks like the gradients are being accumulated.


(Gautam Venkatraman) #5

oh…right…missed that…thanks

However, can I implement the SGDR with warm restarts in PyTorch?
Right now, it does not seem to be restarting. I want the cosine annealing to happen inside every epoch and then restart for the next epoch.


#6

Would you like to lower the learning rate to its minimum in each epoch and then restart from the base learning rate?
If so, you could try the following code:

model = nn.Linear(10, 2)
optimizer = optim.SGD(model.parameters(), lr=1.)
steps = 10
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)

for epoch in range(5):
    for idx in range(steps):
        scheduler.step()
        print(scheduler.get_lr())
    
    print('Reset scheduler')
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)

Note that the steps loop is basically your DataLoader loop.


(Gautam Venkatraman) #7

Thanks a lot!!! Working perfectly now!!!


(Jia Lee) #8

Hi~ Why scheduler.get_lr()[0] changes after we do shceduler.step(), but optimizer.param_groups[0]['lr'] never changes in the loop? Am I missing sth? Hope for your help, thank you!


#9

What does print(optimizer.param_groups[0]['lr'] show? Is the value constant?
Could you post the code how you’ve initialized your lr_scheduler?


(Jia Lee) #10

Thank you for your reply. My code is like this:

optimizer = optim.SGD(posenet.parameters(), lr=opt.learning_rate, momentum=0.9, weight_decay=1e-4)
checkpoint = torch.load(opt.ckpt_path)  
posenet.load_state_dict(checkpoint['weights'])
optimizer.load_state_dict(checkpoint['optimizer_weight'])
print('Optimizer has been resumed from checkpoint...')


scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2, last_epoch=-1) 

for i in range(start_epoch):
    #  update the learning rate for start_epoch times
    scheduler.step()   


def train(epoch):
    print('\n ############################# Train phase, Epoch: {} #############################'.format(epoch))
    posenet.train()
    train_loss = 0
    scheduler.step()
    print('\nLearning rate at this epoch is: %0.9f' % scheduler.get_lr()[0])  # changes every epoch
    # print('\nLearning rate at this epoch is: ', optimizer.param_groups[0]['lr'], '\n')  # Never changes

    for batch_idx, target_tuple in enumerate(train_loader):
          do sth.....

(Jia Lee) #11

I haven’t figured it out yer. Could you please help me. Thx!


#12

Your code looks alright. Do you see a constant learning rate or another issue?


(Jia Lee) #13

Ah, it behaves normal now… The scheduler.get_lr()[0] and optimizer.param_groups[0]['lr'] output equally. Thank you very much, ptrblck, you have helped me for several times! Best wishes for you.