class LCNPModel(nn.Module): """Container module with an encoder, a recurrent module, and a decoder.""" def __init__(self, inputs): super(LCNPModel, self).__init__() .... self.encoder_nt = nn.Embedding(self.nnt, self.dnt) self.word2vec_plus = nn.Embedding(self.nt, self.dt) self.word2vec = nn.Embedding(self.nt, self.dt) self.LSTM = nn.LSTM(self.dt, self.dhid, self.nlayers, batch_first=True, bias=True) # the initial states for h0 and c0 of LSTM self.h0 = (Variable(torch.zeros(self.nlayers, self.bsz, self.dhid)), Variable(torch.zeros(self.nlayers, self.bsz, self.dhid))) ..... self.init_weights(initrange) self.l2 = itertools.ifilter(lambda p: p.requires_grad == True, self.parameters()) def init_weights(self, initrange=1.0): self.word2vec_plus.weight.data.fill_(0) self.word2vec.weight.data = self.term_emb self.encoder_nt.weight.data = self.nonterm_emb self.word2vec.weight.requires_grad = False self.encoder_nt.weight.requires_grad = False ....
Hi, above is part of my code. In my code, I have a word2vec and word2vec_plus embeddings. So I would like an embedding which is initialized as word2vec pretrained vectors, but keep training it further. But when I use the optimizer, I would like to take the l2 norm of this embedding as the distance between the current embedding with the original word2vec embedding, which makes sense since I don’t want the new trained embedding to be too far away from the pretrained one.
My problem is, when I set the word2vec.weight.requires_grad to False, and optimize parameters that require gradient, everything is fine but the training time is too slow after the first round. However, if I comment out everything with word2vec but only use word2vec_plus, everything is very fast. Since you can think of I am using word2vec just as a normal constant in my code, it is not supposed to slow down the model training.
So my question is, if there is any way to speed up this process or is there anything that I am doing wrong?
Thanks a lot!