Learning rate vs AUC


Instead of trying different learning rates, Leslie N. Smith proposed a method to find the most optimal lr. However, I don’t understand why we can’t just plot the lr vs AUC to find the best value. Is it because this method is computationally demanding?

Since my dataset is very small, in just 4 minutes I have already trained all the epochs. In my case, would it be better to decide which is the best lr based on the AUC it provides?

In this case, what values of lr should I try and how far apart? That is, for example from 0.1 to 0.000001 each time dividing by 10, or by 5.

Thanks for your help.