Since you are setting eta_min to the initial learning rate, your scheduler won’t be able to change the learning rate at all.
Set it to a low value or keep the default value of 0.
Also, the scheduler will just manipulate the learning rate. It won’t update your model.
Therefore you should call scheduler.step() in the epoch loop at the beginning of each epoch, and keep optimizer.step() in the DataLoader loop.
Thanks for the quick reply. I tried as you suggested but got a gradient explosion (see below).
Current learning rate is: 0.001
Epoch[1/30](0/69): Loss: 0.7209
Current learning rate is: 0.001
Epoch[1/30](50/69): Loss: 0.7745
Current learning rate is: 0.0009997718922447668
Epoch[2/30](0/69): Loss: 0.6944
Current learning rate is: 0.0009997718922447668
Epoch[2/30](50/69): Loss: 3.4404
Current learning rate is: 0.0009990877771116587
Epoch[3/30](0/69): Loss: 4.8132
Current learning rate is: 0.0009990877771116587
Epoch[3/30](50/69): Loss: 123.8725
Current learning rate is: 0.0009979482788085455
Epoch[4/30](0/69): Loss: 94.6003
Current learning rate is: 0.0009979482788085455
Epoch[4/30](50/69): Loss: 31125.7598
Current learning rate is: 0.000996354437049027
Epoch[5/30](0/69): Loss: 81586.2578
Current learning rate is: 0.000996354437049027
Epoch[5/30](50/69): Loss: 1193706.2500
Current learning rate is: 0.000994307706103767
Epoch[6/30](0/69): Loss: 1318053.0000
Current learning rate is: 0.000994307706103767
Epoch[6/30](50/69): Loss: 386166432.0000
Current learning rate is: 0.0009918099534735718
Epoch[7/30](0/69): Loss: 465092672.0000
Current learning rate is: 0.0009918099534735718
Epoch[7/30](50/69): Loss: 40667889664.0000
Current learning rate is: 0.0009888634581854234
Epoch[8/30](0/69): Loss: 42276134912.0000
Also, I want to reset learning rate every epoch but here its reducing every epoch.
However, can I implement the SGDR with warm restarts in PyTorch?
Right now, it does not seem to be restarting. I want the cosine annealing to happen inside every epoch and then restart for the next epoch.
Would you like to lower the learning rate to its minimum in each epoch and then restart from the base learning rate?
If so, you could try the following code:
model = nn.Linear(10, 2)
optimizer = optim.SGD(model.parameters(), lr=1.)
steps = 10
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)
for epoch in range(5):
for idx in range(steps):
scheduler.step()
print(scheduler.get_lr())
print('Reset scheduler')
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)
Note that the steps loop is basically your DataLoader loop.
Hi~ Why scheduler.get_lr()[0] changes after we do shceduler.step(), but optimizer.param_groups[0]['lr'] never changes in the loop? Am I missing sth? Hope for your help, thank you!
optimizer = optim.SGD(posenet.parameters(), lr=opt.learning_rate, momentum=0.9, weight_decay=1e-4)
checkpoint = torch.load(opt.ckpt_path)
posenet.load_state_dict(checkpoint['weights'])
optimizer.load_state_dict(checkpoint['optimizer_weight'])
print('Optimizer has been resumed from checkpoint...')
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.2, last_epoch=-1)
for i in range(start_epoch):
# update the learning rate for start_epoch times
scheduler.step()
def train(epoch):
print('\n ############################# Train phase, Epoch: {} #############################'.format(epoch))
posenet.train()
train_loss = 0
scheduler.step()
print('\nLearning rate at this epoch is: %0.9f' % scheduler.get_lr()[0]) # changes every epoch
# print('\nLearning rate at this epoch is: ', optimizer.param_groups[0]['lr'], '\n') # Never changes
for batch_idx, target_tuple in enumerate(train_loader):
do sth.....
Ah, it behaves normal now… The scheduler.get_lr()[0] and optimizer.param_groups[0]['lr'] output equally. Thank you very much, ptrblck, you have helped me for several times! Best wishes for you.
Hi, Jia_lee, I met the same issue. scheduler.get_lr()[0] changes every epoch, but optimizer.param_groups[0]['lr'] never changes. How did you fix this? Thank you.
Thanks for the reply. In my case, it turned out that using %.3f to print only the first 3 digits of the learning rate is not enough to see the changes for optim.lr_scheduler.CosineAnnealingLR(), especially when you have a large epoch number. Using %.6f or in scientific notation should work.
Hi @ptrblck , does scheduler.step() change the lr corresponding to the params that were passed to optimizer? If it does how are we supposed to restart then?
The above question is because I read the following lines in the source code, thus questioning if the state of optimizer is being changed by the function call scheduler.step()
for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()):
param_group['lr'] = lr
Yes, the learning rates of each param_group of the optimizer will be changed.
If you want to reset the learning rate, you could use the same code and re-create the scheduler:
# Reset lr
for param_group in optimizer.param_groups:
param_group['lr'] = init_lr
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1, last_epoch=-1)
Hi @ptrblck I don’t think this is going to work now, I heard that we need to make sure the optimizer is also fed with this new learning rate if it needs to work, are there any documentation as to how we can achieve this in pytorch? the way in keras seems to be very straightforward.
Could you link to the discussion, so that I can have a look?
In my example the learning rate of all param_groups will be reset, but I didn’t verify the code on the current master.