Hello,
How can I save and load the vocabulary of the “build_vocab”?
Hello,
How can I save and load the vocabulary of the “build_vocab”?
I would like to bump this. Using torch.load gives an error.
This seems to be a decent workaround:
def save_vocab(vocab, path):
with open(path, 'w+', encoding='utf-8') as f:
for token, index in vocab.stoi.items():
f.write(f'{index}\t{token}\n')
Then :
def read_vocab(path):
vocab = dict()
with open(path, 'r', encoding='utf-8') as f:
for line in f:
index, token = line.split('\t')
vocab[token] = int(index)
return vocab
So you first define a vocab object, for example:
words=Field(**args**)
Then after using words.build_vocab()
, call:
save_vocab(words.vocab, PATH)
And for loading:
quote.vocab=read_vocab(PATH)
So, finally, you would have:
def save_vocab(vocab, path):
with open(path, 'w+', encoding='utf-8') as f:
for token, index in vocab.stoi.items():
f.write(f'{index}\t{token}\n')
def read_vocab(path):
vocab = dict()
with open(path, 'r', encoding='utf-8') as f:
for line in f:
index, token = line.split('\t')
vocab[token] = int(index)
return vocab
words=Field(**args**)
words.build_vocab(dataset, dataset, dataset, ...)
save_vocab(words.vocab, PATH)
words_loaded=Field(**args**)
words_loaded.vocab=read_vocab(PATH)