How do I find the optimal learning rate and weight decay to be used in conjunction with OneCycleLr

Peter_Featherstone · August 16, 2020, 3:19pm

How do I find the optimal learning rate and weight decay to be used in conjunction with OneCycleLr?
If I use something like:

>>> data_loader = torch.utils.data.DataLoader(...)
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
>>> for epoch in range(10):
>>>     for batch in data_loader:
>>>         train_batch(...)
>>>         scheduler.step()

How can I find good values for lr, momentum, max_lr, weight_decay?
This repository https://github.com/davidtvs/pytorch-lr-finder shows you how to get a learning rate curve but how do i find an optimal value for weight decay? Surely the optimal values for both have to be found at the same time, i.e. we need to find a minimum on a hyperplane as opposed to a curve…