L2 regularization toward initial value (for embedding)

you can use optimizer’s weigt_decay option for L2 regularization, but it wont pull it towards initial weight initialization, it only pulls it to t-1 weight values.

You’ll have to implement something like the theano snippet yourself right after the optim.step call.

1 Like