The docs give you the applied formula and show how T_max is used. In particular it’s used to divide the current epoch by its value, which would thus anneal the change in the learning rate and end with the max. learning rate.

CyclicLR cycles the learning rate between two boundaries with a constant frequency.

The original implementation in the paper updates T_cur at every iteration, which means your scheduler.step() should be located at the end of every batch iteration, instead of every epoch, and thus T_max value should be num_epochs_before_restart * len(dataloader) to make it consistent with pytorch implementation of _LRScheduler. Haven’t tested if updating lr every iteration is empirically recommended though.

Ref:

SinceT_cur is updated at each batch iteration t, it can take discredited values such as 0.1, 0.2, etc.