Which will be faster, retrieving each word vector from the word embedding dictionary or from the nn.Embedding()?

For getting embeddings of a sequence, if I use python dictionary, I have to hit the dictionary for each word but if I copy the vectors in nn.Embedding(), for each sequence I have to send the indices of the word and even I can get word embeddings of several sequences just by one call, for example:

input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
embedding(input)`

But, pytorch embedding uses numpy matrix as lookup table.
So, my question is, isn’t for executing the previous code, there will be 7 hits on the numpy matrix? Or it will be retrieved parallelly? Even it runs parallelly, there should be another dictionary to convert word to indices. That also needs 7 hits on the word2indices dictionary.

These questions and issues I am asking because of my original question,
Which will be a faster process ?:
For a sequence:

  1. getting each word vector from the word vector dictionary

  2. Get the indices from the word2indices dictionary and the run like the mentioned code.

Hi,

We don’t use numpy objects to store things, only pytorch’s Tensors.
The good think about indexing is that a single indexing operation will return all the concatenated results, while you would have to do it by hand if you use a python dictionnary.

1 Like