How do I find the optimal learning rate and weight decay to be used in conjunction with OneCycleLr?
If I use something like:
>>> data_loader = torch.utils.data.DataLoader(...)
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
>>> for epoch in range(10):
>>> for batch in data_loader:
>>> train_batch(...)
>>> scheduler.step()
How can I find good values for lr
, momentum
, max_lr
, weight_decay
?
This repository https://github.com/davidtvs/pytorch-lr-finder shows you how to get a learning rate curve but how do i find an optimal value for weight decay? Surely the optimal values for both have to be found at the same time, i.e. we need to find a minimum on a hyperplane as opposed to a curveā¦