Does L2 present in Adam optiimizer?

theGuyWithBlackTie · November 14, 2021, 8:25pm

In PyTorch implementation of Adam optimizer, does weight_decay means it is L2 regularization? PyTorch documentation says weight_decay is L2.

But I read few articles and blog stating that weight_decay is L2 regularization only for vanilla SGD and weight_decay signifies something different in Adam. Is that true? https://openreview.net/pdf?id=rk6qdGgCZ