Learning rate in adam

How to choose an optimum initial learning rate in adam?
When it’s too big, why does the net converge to a wrong solution? Normally too big a learning rate should make the loss oscillate too much but not get stuck in a wrong solution?

You could either run a few experiments using different learning rates or use some utils. which could find the “optimal” learning rate e.g. such as the learn.lr_find() operation from fastai.

1 Like

thanks so much. Is this learn.lr_find() restricted to any particular network? I still don’t know how to use it as I don’t use the net in the link you posted.

No, I don’t think this operation depends on the model architecture. You would need to install and use the fastai package or reimplement their learning rate finder in pure PyTorch.

1 Like