I have obtained the GloVe vectors for each word in a sentence. But I am not able to figure out that how should I obtain the embedding for the whole sentence. would averaging of the word vectors works? If not please suggest the way out. The method must preserve the semantic meaning of the sentence.
You should use torch.nn.Embedding.
Ok, first you should read Glove from file.
#import numpy as np
def load_embeddings(words_id2vector_filename, words_count, embedding_dim=100):
word_emb = np.zeros([words_count , embedding_dim], np.float32)
embeddings = word_emb
word2vec = json.load(open(words_id2vector_filename, 'r'))
for id, vec in word2vec.items():
embeddings[int(id)] = vec
return torch.from_numpy(embeddings)
glove_vectors = load_embddings(...)
#init a embedding layer
embed_layer = torch.nn.Embedding(words_count, embedding_dim)
#use glove init embed weights
embed_layer = Parameter(glove_vectors)
In order to use embed_layer, you have to map each word to its index id, which also corresponds to position in glove. For example, if “hello” is the first in glove, so the word id must be 0.
I have figured out that earlier but word embedding and sentence embedding is not same.
You do not concatenate glove vectors of each word to represent the feature vector of the sentence.
How to represent a sentence? that’s my question.
Maybe you could use a lstm or bi_lstm encode a sent.
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms will give you the desired answers