Hi,
I want to load two text datasets (A and B) by torchtext.
And I build a vocabulary on A using the following code.
# read data
TEXT = data.Field()
LABELS = data.Field(sequential=False)
train, val, test = data.TabularDataset.splits(path=args.data,
train='train.csv',
validation='valid.csv',
test='test.csv',
format='csv',
fields=[('text', TEXT), ('label', LABELS)])
train_iter, val_iter, test_iter = data.BucketIterator.splits((train, val, test),
batch_sizes=(args.batch_size,
4 * args.batch_size,
4 * args.batch_size),
sort_key=lambda x: len(x.text),
device=0)
TEXT.build_vocab(train.text, wv_type=args.wv_type, wv_dim=args.wv_dim)
LABELS.build_vocab(train.label)
I want to use the same vocabulary on B instead of rebuild a new one.
Is there any solutions by torchtext?
- Can I dump vocab in torchtext and load-assign it?
- Can I reuse the
Field
in torchtext?
Thanks