7974 is the loop until finishing train. And the shape of exp_avg and exp_avg_sq are both 512.
gt and gt2 ?If the net is overfit, should I build a new optimizer for the purpose of cleaning the exp_avg and exp_avg_sq? Or just modify the lr.
anyone can help? Can I clean these states?
I’m not sure if manipulating the internal optimizer states are related to counter overfitting.
What would be the idea behind it?