Hi, I have some problems in understanding embedding layer in PyTorch. I know that embedding layer is a lookup table with dimensions
vocab_size x embedding_dim. we can retrieve embedding vectors from embedding layer by their indices. Suppose I want to use pretrained word embedding vectors obtained from GloVe model.
This is some parts of my code:
self.embedding = nn.Embedding(vocab_size, hidden_size)
def forward(input, hidden):
embedded = self.embedding(input)
my_embeddings is a
vocab_size x embedding_dim matrix.
1- Is it correct that
vocab_sizeis the count of unique words in the train dataset?
for missing words of train dataset (the words that are not included in the pretrained model) I have built an embedding vector that is the average of all of the word embedding vectors in train dataset.
I dont have problem with train dataset. But I have a problem with the test dataset.
2-Do I have to use from the same
my_embeddingas in the train phase (
my_embeddingthat is constructed by the train dataset)?
3- What should I do for the words from test dataset that are not included in
Thanks in advance.