How do I find the optimal learning rate and weight decay to be used in conjunction with OneCycleLr?

If I use something like:

```
>>> data_loader = torch.utils.data.DataLoader(...)
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
>>> for epoch in range(10):
>>> for batch in data_loader:
>>> train_batch(...)
>>> scheduler.step()
```

How can I find good values for `lr`

, `momentum`

, `max_lr`

, `weight_decay`

?

This repository https://github.com/davidtvs/pytorch-lr-finder shows you how to get a learning rate curve but how do i find an optimal value for weight decay? Surely the optimal values for both have to be found at the same time, i.e. we need to find a minimum on a hyperplane as opposed to a curveā¦