I was trying to use the cosineAnnealing Learning rate but I was confused about what should be the T_max parameter be.
Whether it should be number of epochs, length of train_loader or multiple of the two?
CosineAnnealingLR (optimizer , T_max , eta_min=0 , last_epoch=-1 , verbose=False )
Also can someone help me understand how it is different from CyclicLR
The docs give you the applied formula and show how
T_max is used. In particular it’s used to divide the current epoch by its value, which would thus anneal the change in the learning rate and end with the max. learning rate.
CyclicLR cycles the learning rate between two boundaries with a constant frequency.
The original implementation in the paper updates
T_cur at every iteration, which means your
scheduler.step() should be located at the end of every batch iteration, instead of every epoch, and thus
T_max value should be
num_epochs_before_restart * len(dataloader) to make it consistent with pytorch implementation of
_LRScheduler. Haven’t tested if updating lr every iteration is empirically recommended though.
SinceT_cur is updated at each batch iteration t, it can take discredited values such as 0.1, 0.2, etc.