How do I find the optimal learning rate and weight decay to be used in conjunction with OneCycleLr?
If I use something like:
>>> data_loader = torch.utils.data.DataLoader(...) >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) >>> scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10) >>> for epoch in range(10): >>> for batch in data_loader: >>> train_batch(...) >>> scheduler.step()
How can I find good values for
This repository https://github.com/davidtvs/pytorch-lr-finder shows you how to get a learning rate curve but how do i find an optimal value for weight decay? Surely the optimal values for both have to be found at the same time, i.e. we need to find a minimum on a hyperplane as opposed to a curve…