It usually uses 0.001 when learning a network using Adam, but the appropriate learning rate for SGD is all different (even set the learning rate to 30 !! when learning LSTM for PTB dataset, Language modeling task)
If so, is there a specific criterion or tendency to set the learning rate of SGD?
(For example, it depends on the type of networks or tasks.)
Or should I rely entirely on the user’s experience?