How to choose an optimum initial learning rate in adam?
When it’s too big, why does the net converge to a wrong solution? Normally too big a learning rate should make the loss oscillate too much but not get stuck in a wrong solution?
You could either run a few experiments using different learning rates or use some utils. which could find the “optimal” learning rate e.g. such as the learn.lr_find()
operation from fastai.
1 Like
thanks so much. Is this learn.lr_find() restricted to any particular network? I still don’t know how to use it as I don’t use the net in the link you posted.
No, I don’t think this operation depends on the model architecture. You would need to install and use the fastai
package or reimplement their learning rate finder in pure PyTorch.
1 Like