Hello, is your code correct? I recently encountered a similar problem
Why is L2 regularization included in the optimizers? L1 and L2 regularization are modifications of the loss function. Wouldn’t it make more sense to add functions for calculating L1 and L2 penalties that you can then add to your loss before backpropagating?
1 Like