I’m sorry, I’m new in pytorch, and I can’t find how pytorch implement L2 regularization (weigh_decay)?
I mean there are several styles of formula out there to implement L2 regularization, which one is implemented in pytorch? because it leads to how big is value needed to assigned
Looking at the code for the SGD optimizer in particular it looks like it’s implemented by
adding weight_decay * data to the gradients. Does this answer your question?