I train a custom word2vec embedding file named “word2vec.txt” and I would like to use it in TEXT.build_vocab(train_data, vectors=Vectors("word2vec.txt"))
where train_data is my training data in torchtext Dataset.
But I got this issue:
Vector for token b’\xc2\xa0’ has 301 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions.
I have checked my embedding file. All vectors are 300 dimensions. If I change the embedding file to pre-trained glove file, it works without any issue.