How do I choose the learning rate for SGD?

It usually uses 0.001 when learning a network using Adam, but the appropriate learning rate for SGD is all different (even set the learning rate to 30 !! when learning LSTM for PTB dataset, Language modeling task)

If so, is there a specific criterion or tendency to set the learning rate of SGD?
(For example, it depends on the type of networks or tasks.)
Or should I rely entirely on the user’s experience?

Im very confused…

It’s user experience and depends on fields and problems to solve. There are some methods that suggest a lr. In my experience SGD’s is higher than adam’s

Thank you so much
I’ll read this paper
It will probably help me a lot
Have a nice day!