How to choose learning rate when using Mixed Precision Training

numbpy · November 28, 2019, 4:09am

I’m training ImageNet on ResNet-50 architecture using Fastai v1 1.0.60dev (Pytorch 1.3.0). There are substantial speed gains using Mixed Precision Training since I can effectively use 4 times the batch size (= 512) thanks to reduction in VRAM consumption and using smaller size of 224. The problem is I am unable to select a good learning rate. I am using fastai’s lr_finder with SGD but the suggested lr causes huge overfitting. Diving the suggested lr by 100 just overfits a bit while lr divided by 512 seems to be okish but slow.

While these guesses work, I’m not sure how to choose a good learning rate in general. I thought about using APEX but the dynamic loss scaling seems to be integrated in the learn.to_fp16(). Training is done using learn.fit_one_cycle() ie 1-cycle policy. I think everything else is working fine.
Below is the image for lr divided by 100

And this is for lr divided by 512
Screenshot%20at%202019-11-28%2008-57-26

rvarm1 · December 2, 2019, 1:32am

Could you potentially try a learning rate in between lr/100 and lr/512, or potentially stick with lr/100 and decay the learning rate over time? Also, this might be a better question for the fastai forums, since it uses fastai’s library: https://forums.fast.ai/c/fastai-users

numbpy · December 2, 2019, 8:25am

I posted the question there but haven’t gotten a reply. Will try varying the learning rates as suggested.