Hi, I am using Google’s Lion to minimize a HuberLoss in my model. The learning rate I use is 6e-4, weight decay is 1e-2, β1 and β2 are 0.9 and 0.99 as the optimizer recommended.
I performed a 10000 epoch run with batch size 60, iteration early stopped at epoch 2903 under the loss criterion 3e-5. As far as I know, the mean loss has never been less than 1e-5.
So my question is this situation could get any better? Any advice or experience would be saving my life.