Torchtext - can we save the processed dataset and fields?


I’m currently using torchtext, but I found that creating Datasetobject and calling Field’s build_vocab takes quite a long time, especially when the tokenizer is complicated. However, I failed saving them with pickle. Is there a way that we can save the processed dataset and fields, so that we can speed up data loading?



Hi did you find a good way to do it? I have found this way however,

TEXT = data.Field(sequential=True, tokenize=tokenizer, lower=True,fix_length=200,batch_first=True)
with open("model/TEXT.Field","wb")as f:


But can you currently somehow save dataset?

Currently, Pytorch has provide the interface to save/load the processed vocab. You can directly use the and torch.load() to operate the TEXT.Field object.