Word language model, modified - using existing word embedding

transfluxus · December 15, 2017, 6:36pm

Hi,
After experimenting with different language models I wanted to make a change on the pytorch word_language_model example.
I’m not sure if my idea is in principle possible or good but it should be possible anyway…

Basically I want to take out the encoder(embedding)/decoder part and train the language model using an existing word embedding, which is trained beforehand with gensim. (the source code for that is in my repo (link below).

I train a Word2Vec model with the size 200 before training the language model. The corpus tensors will have the shape:
ntoken * 200 and each batch bptt(sequence length) * batch_size * 200

The Model therefore looks like rather simple:
The input gets staright into the RNN and after that into Linear Layer to bring it back to size 200 in order to be compared with the target.

class RNNModel(nn.Module):
    """Container module with an encoder, a recurrent module, and a decoder."""

    def __init__(self, rnn_type, nhid, nlayers, dropout=0.5, word_embedding = None):
        super(RNNModel, self).__init__()
        self.wem = word_embedding
        # self.drop = nn.Dropout(dropout)
        if rnn_type in ['LSTM', 'GRU']:
            self.rnn = getattr(nn, rnn_type)(word_embedding.vector_size, nhid, nlayers, dropout=dropout)
        else:
            try:
                nonlinearity = {'RNN_TANH': 'tanh', 'RNN_RELU': 'relu'}[rnn_type]
            except KeyError:
                raise ValueError( """An invalid option for `--model` was supplied,
                                 options are ['LSTM', 'GRU', 'RNN_TANH' or 'RNN_RELU']""")
            self.rnn = nn.RNN(word_embedding.vector_size, nhid, nlayers, nonlinearity=nonlinearity, dropout=dropout)
        self.LOut = nn.Linear(nhid, word_embedding.vector_size)
        self.rnn_type = rnn_type
        self.nhid = nhid
        self.nlayers = nlayers


    def forward(self, input, hidden):
        # self.rnn.flatten_parameters()
        output, hidden = self.rnn(input, hidden)
        # output = self.drop(output)
        # print('output',output.data.shape)
        output = self.LOut(output)
        return output,hidden

There are also some changes in the main module.
First I had to change the Loss function, since the target tensor does not contain classes (word ids) but word vectors. I used the MSELoss function.
2nd I tried to replace the manual parameter adaption

for p in model.parameters():
    p.data.add_(-lr, p.grad.data)

with optim.SGD as I have seen this been used in other language model examples. I Tried several learning rates: The default 20 of the example, 1 and 0.005 or something in that range.

The result is that the model basically doesn’t learn anything . It starts with a average loss of 1.15 and maybe gets down to 0.94. The word output is anyway pretty random :).
I played a bit the hyperparameters without any success.

Before giving up I wanted to ask, if somebody could give me a hint of what goes wrong.
My changes are on this forked repo, wem_model branch.
It includes the word2vec model, which can also be created with the embedding.py script and only takes a couple of minutes on the wiki-2 dataset.
Obviously all this needs gensim.
I also included a notebook, that basically does all the things in main.py step by step, with which I tried to understand what is going on.

What’s going on?!

transfluxus · December 17, 2017, 6:08pm

still confusing but I guess replacing the existing embedding weights will do it

mobius · October 31, 2018, 12:07pm

Ok this is quite old topic, but I though that people might still wonder about this, so here goes.

I’ve tried what you proposed and if I am not wrong what you should do, is keep an index to weights dict when you initially encode your strings to ids (in data.py) and later in the model instead of creating the encoder (which is an nn.Embeddings module) weights dynamically, you pass the weights of the indexes you accumulated.

So one should replace:

github.com

pytorch/examples/blob/81f47e8ea49c74494d2aa8dc1c9c4ddc6c0eca73/word_language_model/model.py#L40


        self.decoder.weight = self.encoder.weight


    self.init_weights()


    self.rnn_type = rnn_type
    self.nhid = nhid
    self.nlayers = nlayers


def init_weights(self):
    initrange = 0.1
    self.encoder.weight.data.uniform_(-initrange, initrange)
    self.decoder.bias.data.zero_()
    self.decoder.weight.data.uniform_(-initrange, initrange)


def forward(self, input, hidden):
    emb = self.drop(self.encoder(input))
    output, hidden = self.rnn(emb, hidden)
    output = self.drop(output)
    decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))
    return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden

with something along the lines (suppose the vectors is the idx to weights dict)

w = list(vectors.values())
weights = np.asarray(w, dtype=np.float32)

self.encoder.weight.data.copy_(torch.from_numpy(weights))