Sentence embedding

Abhishek_Das · February 20, 2019, 7:04am

I have obtained the GloVe vectors for each word in a sentence. But I am not able to figure out that how should I obtain the embedding for the whole sentence. would averaging of the word vectors works? If not please suggest the way out. The method must preserve the semantic meaning of the sentence.

gaikin · February 28, 2019, 6:07am

You should use torch.nn.Embedding.
Ok, first you should read Glove from file.

#import numpy as np
def load_embeddings(words_id2vector_filename, words_count, embedding_dim=100):
    word_emb = np.zeros([words_count , embedding_dim], np.float32)
    embeddings = word_emb
    word2vec = json.load(open(words_id2vector_filename, 'r'))
    for id, vec in word2vec.items():
        embeddings[int(id)] = vec
    return torch.from_numpy(embeddings)

glove_vectors = load_embddings(...)
#init a embedding layer
embed_layer = torch.nn.Embedding(words_count, embedding_dim)

#use glove init embed weights
embed_layer = Parameter(glove_vectors)

In order to use embed_layer, you have to map each word to its index id, which also corresponds to position in glove. For example, if “hello” is the first in glove, so the word id must be 0.

Abhishek_Das · February 28, 2019, 6:48am

I have figured out that earlier but word embedding and sentence embedding is not same.
You do not concatenate glove vectors of each word to represent the feature vector of the sentence.
How to represent a sentence? that’s my question.

gaikin · February 28, 2019, 6:50am

Maybe you could use a lstm or bi_lstm encode a sent.

DoubtWang · February 28, 2019, 7:06am

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms will give you the desired answers