How do I apply weight decay (L2) selectively?

hughperkins · July 8, 2017, 12:37pm

Momentum and such are handled by the Optimizer itself, but as far as I know, weight decay, such as L1 and L2, can be implemented as a separate step, after the optimizer step?

So, seems like you could just grab the parameter Tensors/Variables for your LSTM, and subtract a fraction of the L2 norm from them?