Changing and using a pretrained output linear projection layer in LSTM

nilinykh · January 15, 2020, 1:43pm

I have a question specific to using some pretrained output linear projection layer of one LSTM network for another.

What I have:

A pretrained fully connected layer which was used to output word probabilities into the actual words for some language generation model, something like this:

self.output_linear_projection = nn.Linear(self.wordRNN_dim, self.vocab_size)

Here, self.wordRNN_dim is 512 (hidden size of LSTM), and self.vocab_size is the number of words I have. For this pretrained model, vocabulary size is 10509, where it has 10508 words and the last element is projection of the <end> and <start> token (same projection is used for these tokens)

My language generation model, for which I want to use the output linear projection layer from the pretrained model. Vocabulary of my model also includes <pad> token, which is absent in the pretrained model. So my vocabulary’s size is 10510 then.

The question:

if I want to use this pretrained output linear projection layer in my model, what should/could I do? Is it somehow conventional to not project <pad> tokens? If so, should I somehow ignore it in my projection and how? And if there are any steps I should take, any hints on how to do them in PyTorch?