What are good practices for obtaining new hyper parameters for new Deep Learning Architectures?

I am concerned for 2 scenarios:

  1. when one is running small experiments like Cifar/MNIST, what are good practices for obtaining new hyperparams for a new architecture?

  2. when one is running small experiments like ImageNet, what are good practices for obtaining new hyperparams for a new architecture?

since obtaining new hyperparams might be expensive I wanted to know good practices for knwoing the values of these hyperparams.

[note: it would be nice to know answers that are independent of batch norm also, not just “batch norm solves everything trying anything”, hope I can get a tiny bit more details than that…]