L2 regularization toward initial value (for embedding)

I world like to L2 toward initial value for embedding model.

code snipe in Theano ([link]
(https://github.com/jwieting/iclr2016/blob/master/sentiment/lstm_model_sentiment.py))

    l2 = 0.5*params.LC*sum(lasagne.regularization.l2(x) for x in self.network_params)
    if params.updatewords:
         return l2 + 0.5*params.LW*lasagne.regularization.l2(We-initial_We)
     else:
         return l2

In paper, the author said “All models use L2 regularization on all parameters, except for the word embeddings, which are regularized back to their initial values with an L2 penalty”.
But I don’t know how to “regularized back to their initial values”.

I have tried this. But it did not work as expected.

    optimizer = optim.Adam([
            {'params': model.parameters(), 'lr':args.lr, 'weight_decay':args.wd },
            {'params': embedding_model.parameters(), 'lr': args.emblr, 'weight_decay':args.embwd}
        ])

you can use optimizer’s weigt_decay option for L2 regularization, but it wont pull it towards initial weight initialization, it only pulls it to t-1 weight values.

You’ll have to implement something like the theano snippet yourself right after the optim.step call.

1 Like

Here is how I implement my custom L2. Can anybody verify if it is correct ?

Here is getParameters function, which take all parameters of sub-model and flat it so I can get norm easily

def getParameters(self):
    """
    Get flatParameters
    note that getParameters and parameters is not equal in this case
    getParameters do not get parameters of output module
    :return: 1d tensor
    """
    params = []
    for m in [self.ix, self.ih, self.fx, self.fh, self.ox, self.oh, self.ux, self.uh]:
        # we do not get param of output module
        l = list(m.parameters())
        params.extend(l)

    one_dim = [p.view(p.numel()) for p in params]
    params = F.torch.cat(one_dim)
    return params

I add my custom L2 to err before I call backward, then step
Only err is Variable. err is output of criterion(output, target).
But l2_model and l2_emb_params, batch_size are not Variable (they are float and int)

    params = self.model.getParameters()
    params_norm = params.data.norm()
    l2_model = 0.5*self.args.reg*params_norm*params_norm
    emb_params = list(self.embedding_model.parameters())[0]
    emb_params_norm = (emb_params.data - self.emb_params_init).norm()
    l2_emb_params = 0.5 * self.args.embreg* emb_params_norm * emb_params_norm
    err = (err + l2_model + l2_emb_params) / batch_size

    err.backward()

    # after loop batch_size
    optim.step()
    optim.zero_grad()