Changing and using a pretrained output linear projection layer in LSTM

I have a question specific to using some pretrained output linear projection layer of one LSTM network for another.

What I have:

  1. A pretrained fully connected layer which was used to output word probabilities into the actual words for some language generation model, something like this:

self.output_linear_projection = nn.Linear(self.wordRNN_dim, self.vocab_size)

  • Here, self.wordRNN_dim is 512 (hidden size of LSTM), and self.vocab_size is the number of words I have. For this pretrained model, vocabulary size is 10509, where it has 10508 words and the last element is projection of the <end> and <start> token (same projection is used for these tokens)
  1. My language generation model, for which I want to use the output linear projection layer from the pretrained model. Vocabulary of my model also includes <pad> token, which is absent in the pretrained model. So my vocabulary’s size is 10510 then.

The question:

  • if I want to use this pretrained output linear projection layer in my model, what should/could I do? Is it somehow conventional to not project <pad> tokens? If so, should I somehow ignore it in my projection and how? And if there are any steps I should take, any hints on how to do them in PyTorch?