I have created a neural network for sentiment analysis using bidirectional LSTM layers and pre-trained GloVe embeddings.
During the training I noticed that the
nn.Embedding layers with the freezed embedding weights uses the whole vocabulary of GloVe:
(output of the instantiated model object)
(embedding): Embedding(400000, 50, padding_idx=0)
Also the structure of the
self.embedding = nn.Embedding.from_pretrained(embedding_matrix, freeze=True, padding_idx=self.padding_idx)
embedding_matrix = glove_vectors.vectors object and glove_vectors =
torchtext.vocab.GloVe(name='6B', dim=50) source
400,000 is the shape of glove_vectors object (meaning 400,000 pre-trained words in total).
Then I noticed that the training of the LSTM neural network took approximately 3 to 5 minutes per epoch. Which is quite too long for only 150,000 trainable parameters. And I was wondering if this had to do with the use of the whole embedding matrix with 400,000 words or it’s normal because of the bidirectional LSTM method.
Is it worth to create a minimized version of the GloVe embeddings matrix from the words that only exist in my sentences or using the whole GloVe embeddings matrix it does not affect the training performance?