CosineAnnealingLR step size (T_max)

Karan_Chhabra · December 1, 2020, 12:51am

Hi,
I was trying to use the cosineAnnealing Learning rate but I was confused about what should be the T_max parameter be.

Whether it should be number of epochs, length of train_loader or multiple of the two?

torch.optim.lr_scheduler. CosineAnnealingLR (optimizer , T_max , eta_min=0 , last_epoch=-1 , verbose=False )

Karan_Chhabra · December 2, 2020, 1:18am

Also can someone help me understand how it is different from CyclicLR

ptrblck · December 4, 2020, 6:17am

The docs give you the applied formula and show how T_max is used. In particular it’s used to divide the current epoch by its value, which would thus anneal the change in the learning rate and end with the max. learning rate.

CyclicLR cycles the learning rate between two boundaries with a constant frequency.

ntomita · April 22, 2021, 4:07pm

The original implementation in the paper updates T_cur at every iteration, which means your scheduler.step() should be located at the end of every batch iteration, instead of every epoch, and thus T_max value should be num_epochs_before_restart * len(dataloader) to make it consistent with pytorch implementation of _LRScheduler. Haven’t tested if updating lr every iteration is empirically recommended though.

Ref:

SinceT_cur is updated at each batch iteration t, it can take discredited values such as 0.1, 0.2, etc.