Why OneCycleLR stores part of variables in optimizer groups

mombip · November 18, 2024, 1:51pm

After analyzing the source code, I noticed that OneCycleLR stores max_lr and min_lr (alongside initial_lr, which is managed by the base class) in the optimizer’s parameter groups.

    for idx, group in enumerate(self.optimizer.param_groups):
        group['initial_lr'] = max_lrs[idx] / div_factor
        group['max_lr'] = max_lrs[idx]
        group['min_lr'] = group['initial_lr'] / final_div_factor

This design effectively makes these variables shared within the scope of a parameter group across all OneCycleLR schedulers used during training (e.g., in SequentialLR).

This approach has the side effect of making it impossible to set different max_lr or min_lr values for multiple OneCycleLR instances applied to the same optimizer. Additionally, the documentation explicitly states that “This scheduler is not chainable.”

My intention is not to criticize this approach but to understand its motivation. Why was the decision made to store OneCycleLR-specific variables in the optimizer’s parameter groups, rather than keeping them local to the scheduler itself, especially given that scheduler-local storage might allow for greater isolation and flexibility?