For getting embeddings of a sequence, if I use python dictionary, I have to hit the dictionary for each word but if I copy the vectors in nn.Embedding(), for each sequence I have to send the indices of the word and even I can get word embeddings of several sequences just by one call, for example:
input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
embedding(input)`
But, pytorch embedding uses numpy matrix as lookup table.
So, my question is, isn’t for executing the previous code, there will be 7 hits on the numpy matrix? Or it will be retrieved parallelly? Even it runs parallelly, there should be another dictionary to convert word to indices. That also needs 7 hits on the word2indices dictionary.
These questions and issues I am asking because of my original question,
Which will be a faster process ?:
For a sequence:
-
getting each word vector from the word vector dictionary
-
Get the indices from the word2indices dictionary and the run like the mentioned code.