Why is the OneCycleLR scheduler's "max_lr" is smaller than the SGD optimizer's "lr" in the documentation example?

Hello,

Referring to the documentation of OneCycleLR at here, there is a provided sample code as follows:

data_loader = torch.utils.data.DataLoader(...)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
for epoch in range(10):
    for batch in data_loader:
        train_batch(...)
        scheduler.step()

From my understanding, the max_lr of the OneCycleLR is the maximum learning rate for the optimizer. However, I am not sure why in the example code the max_lr is smaller than the SGD optimizer’s lr?

Hey, this is also my concern for a long time! Is this a mistake as I also see ChatGPT does the same thing.