How to download and use glove vectors?

First of all, I would like to know if Glove is the best pre-trained embedding for an NLP application ?
Secondly, how can I get the glove embeddings in Pytorch?
Thirdly, can i, for example, extract out the embedding for a specific word, like, ‘king’ and ‘queen’ ?

Thanks in advance :slight_smile:

  1. I’m not sure, but out NLP experts might know the answer. :wink:

  2. This blog post describes, how to load and use the embeddings. Note that now you can also use the classmethod from_pretrained to load the weigths.

  3. Yes, this is also shown in the blog post.

5 Likes

If it helps, you can have a look at my code for that. You only need the create_embedding_matrix method – load_glove and generate_embedding_matrix were my initial solution, but there’s not need to load and store all word embeddings, since you need only those that match your vocabulary.

The word_to_index and max_index reflect the information from your vocabulary, with word_to_index mapping each word to a unique index from 0..max_index (not that I’ve written it, you probably don’t need max_index as an extra parameter). I use my own implementation of a vectorizer, but torchtext should give you similar information.

A full example of how it works can been seen in this notebook.

To your other questions:

  • There’s hardly ever one best solution out there, and new types embeddings are proposed on properly a weekly basis. My tip would be: Just the something running, see how it works, and then try different alternatives to compare.
  • Of course you can get the embedding for a specific word. That’s essentially the content for the GloVe files. Each line contains first the word and then the n values of the embedding vector (with n being the vector size, e.g., 50, 100, 300)
4 Likes

i get the idea, thanks for the clarification :slight_smile:

ahhaha :stuck_out_tongue: thanks for the help @ptrblck