Hi guys, I’m super new to
torchtext lib and currently learning.
I’ve read some tutorials of
torchtext.data, but still not sure what should I do for some use-cases.
For example, an ordinary task is to train a
word2vec model in NLP, and I’m not sure what’s the correct and most efficient way to prepare/load the training data for this model training task?
I tried the most naive approch, i.e. write an adhoc python script to pre-process the corpus into a tsv format, for example like this,
anarchism originated 1 anarchism as 1 anarchism a 1 anarchism term 1 anarchism of 1 anarchism race 0 anarchism one 0 anarchism from 0 anarchism details 0 anarchism hereditary 0
the 1st column is target word, 2nd is context word, 3nd is label (0 means negative sampling).
Then, tried to load this preprocessed corpus via
But this cannot work, since it took so much memory and time, and never finished.
Please provide some comments/answers, what is the right way to do this?