Adam and AdamW confusion

arshishir · September 13, 2022, 3:07pm

AdamW handles weight decay correctly and Pytorch has implemented AdamW. Then, why does Adam have weight_decay?

Also, what’s the correct optimizer in Pytroch to use when I want to use weight decay? I am confused because most of the papers use Adam when they have weight decay instead of AdamW. Why are they doing so?