Hello,
Referring to the documentation of OneCycleLR at here, there is a provided sample code as follows:
data_loader = torch.utils.data.DataLoader(...)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
for epoch in range(10):
for batch in data_loader:
train_batch(...)
scheduler.step()
From my understanding, the max_lr
of the OneCycleLR is the maximum learning rate for the optimizer. However, I am not sure why in the example code the max_lr
is smaller than the SGD optimizer’s lr
?