Hi,
After experimenting with different language models I wanted to make a change on the pytorch word_language_model example.
I’m not sure if my idea is in principle possible or good but it should be possible anyway…
Basically I want to take out the encoder(embedding)/decoder part and train the language model using an existing word embedding, which is trained beforehand with gensim. (the source code for that is in my repo (link below).
I train a Word2Vec model with the size 200 before training the language model. The corpus tensors will have the shape:
ntoken * 200
and each batch bptt(sequence length) * batch_size * 200
The Model therefore looks like rather simple:
The input gets staright into the RNN and after that into Linear Layer to bring it back to size 200 in order to be compared with the target.
class RNNModel(nn.Module):
"""Container module with an encoder, a recurrent module, and a decoder."""
def __init__(self, rnn_type, nhid, nlayers, dropout=0.5, word_embedding = None):
super(RNNModel, self).__init__()
self.wem = word_embedding
# self.drop = nn.Dropout(dropout)
if rnn_type in ['LSTM', 'GRU']:
self.rnn = getattr(nn, rnn_type)(word_embedding.vector_size, nhid, nlayers, dropout=dropout)
else:
try:
nonlinearity = {'RNN_TANH': 'tanh', 'RNN_RELU': 'relu'}[rnn_type]
except KeyError:
raise ValueError( """An invalid option for `--model` was supplied,
options are ['LSTM', 'GRU', 'RNN_TANH' or 'RNN_RELU']""")
self.rnn = nn.RNN(word_embedding.vector_size, nhid, nlayers, nonlinearity=nonlinearity, dropout=dropout)
self.LOut = nn.Linear(nhid, word_embedding.vector_size)
self.rnn_type = rnn_type
self.nhid = nhid
self.nlayers = nlayers
def forward(self, input, hidden):
# self.rnn.flatten_parameters()
output, hidden = self.rnn(input, hidden)
# output = self.drop(output)
# print('output',output.data.shape)
output = self.LOut(output)
return output,hidden
There are also some changes in the main module.
First I had to change the Loss function, since the target tensor does not contain classes (word ids) but word vectors. I used the MSELoss
function.
2nd I tried to replace the manual parameter adaption
for p in model.parameters():
p.data.add_(-lr, p.grad.data)
with optim.SGD
as I have seen this been used in other language model examples. I Tried several learning rates: The default 20
of the example, 1
and 0.005
or something in that range.
The result is that the model basically doesn’t learn anything . It starts with a average loss of 1.15
and maybe gets down to 0.94
. The word output is anyway pretty random :).
I played a bit the hyperparameters without any success.
Before giving up I wanted to ask, if somebody could give me a hint of what goes wrong.
My changes are on this forked repo, wem_model branch.
It includes the word2vec model, which can also be created with the embedding.py
script and only takes a couple of minutes on the wiki-2 dataset.
Obviously all this needs gensim.
I also included a notebook, that basically does all the things in main.py
step by step, with which I tried to understand what is going on.
What’s going on?!