Weight decay vs L2 regularisation

I just want to confirm whether the weight decay parameter in optimisers is equivalent to applying L2 regularisation. According to fastai’s article on this, weight decay and L2 regularisation are only equivalent when used in vanilla SGD. There was also the Decoupling Weight decay paper that states weight decay is a better alternative to L2 loss. Just wanted to how this is done in Pytorch.