How do AI researchers find the right hyperparameters for the models they create ? In their white papers they often provide the hyperparameter configurations, but how do they find them? is there a procedure to know which hyperparameter is problematic (number of layers or learning rate or both for example), Or how can you know wether you network isn’t correct at all or you only have to use a different activation function.
Most commonly, folks just try a bunch of different things until they get
tired of trying more things, and then use the best combination they found.
Sometimes they try things based on experience or intuition or ideas they’ve
seen in the literature or suggestions friends make. But a lot of the time
they just try things at random.
To be a little less flippant, people will systematically study various
hyperparameters. The ConvNeXt paper is one example of this, although
it is more about systematically adjusting architectural features, rather