As this post https://bbabenko.github.io/weight-decay/ suggests about the equivalence of weight-decay and L2 regularizer, can introducing a parameter as below, say weight_decay_one, in an optimizer like SGD, give an equivalent L1 regularizer?:
In the definition of the method step:
if weight_decay_one != 0:
d_p.add_(weight_decay_one)
The intuition here is that the differentiation of an L1 regularizer gives a constant.