Hi,
I am following a seq2seq tutorial:
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
I want to use pre-trained vectors (Word2Vec) instead of word2index as we can see in the tutorial. I have edited the code to get the vector of the word rather than the index:
class Lang:
def __init__(self, name):
self.name = name
self.word2index = {}
self.word2count = {}
self.index2word = {0: "SOS", 1: "EOS"}
self.n_words = 2 # Count SOS and EOS
def get_word2vec(self):
word2vec = KeyedVectors.load_word2vec_format('Models/Word2Vec/wiki.he.vec')
return word2vec
def addSentence(self, sentence):
for word in sentence.split(' '):
self.addWord(word)
def addWord(self, word):
if word not in self.word2index:
self.word2index[word] = self.get_word2vec[word]
self.word2count[word] = 1
self.index2word[self.n_words] = word
self.n_words += 1
else:
self.word2count[word] += 1
the dimension size of this word2vec is 300 dimensions
- Is this the right way to do that?
- May I need to change other things in my Encoder\decoder\ NN?
Thank you!