How to download and use glove vectors?

n0obcoder · August 31, 2019, 7:08pm

First of all, I would like to know if Glove is the best pre-trained embedding for an NLP application ?
Secondly, how can I get the glove embeddings in Pytorch?
Thirdly, can i, for example, extract out the embedding for a specific word, like, ‘king’ and ‘queen’ ?

Thanks in advance

ptrblck · August 31, 2019, 8:34pm

I’m not sure, but out NLP experts might know the answer.
This blog post describes, how to load and use the embeddings. Note that now you can also use the classmethod from_pretrained to load the weigths.
Yes, this is also shown in the blog post.

vdw · September 1, 2019, 5:48am

If it helps, you can have a look at my code for that. You only need the create_embedding_matrix method – load_glove and generate_embedding_matrix were my initial solution, but there’s not need to load and store all word embeddings, since you need only those that match your vocabulary.

The word_to_index and max_index reflect the information from your vocabulary, with word_to_index mapping each word to a unique index from 0..max_index (not that I’ve written it, you probably don’t need max_index as an extra parameter). I use my own implementation of a vectorizer, but torchtext should give you similar information.

A full example of how it works can been seen in this notebook.

To your other questions:

There’s hardly ever one best solution out there, and new types embeddings are proposed on properly a weekly basis. My tip would be: Just the something running, see how it works, and then try different alternatives to compare.
Of course you can get the embedding for a specific word. That’s essentially the content for the GloVe files. Each line contains first the word and then the n values of the embedding vector (with n being the vector size, e.g., 50, 100, 300)

n0obcoder · September 1, 2019, 6:47am

i get the idea, thanks for the clarification

n0obcoder · September 1, 2019, 6:47am

ahhaha thanks for the help @ptrblck