I want to build a vocabulary from my training and test datasets using the torchtext. Of course, I can do it as follow:
where TEXT is a Field object, and the train and test are Dataset objects. But I do not have enough memory to load training and test datasets at the same time. When I performed like this:
TEXT.build_vocab(train) del train TEXT.build_vocab(test) del test
it only builds the vocab from the test data.
How can I build the vocab in 2 steps so that I can release the memory after I create the corresponding vocab?