After analyzing the source code, I noticed that OneCycleLR
stores max_lr
and min_lr
(alongside initial_lr
, which is managed by the base class) in the optimizer’s parameter groups.
for idx, group in enumerate(self.optimizer.param_groups):
group['initial_lr'] = max_lrs[idx] / div_factor
group['max_lr'] = max_lrs[idx]
group['min_lr'] = group['initial_lr'] / final_div_factor
This design effectively makes these variables shared within the scope of a parameter group across all OneCycleLR
schedulers used during training (e.g., in SequentialLR
).
This approach has the side effect of making it impossible to set different max_lr
or min_lr
values for multiple OneCycleLR
instances applied to the same optimizer. Additionally, the documentation explicitly states that “This scheduler is not chainable.”
My intention is not to criticize this approach but to understand its motivation. Why was the decision made to store OneCycleLR
-specific variables in the optimizer’s parameter groups, rather than keeping them local to the scheduler itself, especially given that scheduler-local storage might allow for greater isolation and flexibility?