Adding an l2 norm term to adam optimizer

I’m trying to understand how the adam optimizer was implemented in pytorch.

Basically, I would like to penalize the returned loss with an l_2 norm of some noise variable (for use in a specific problem).

The way the loss is written is not as intuitive as the paper seems to argue. What’s the advisable way to do this?

1 Like