I am currently working on sentiment analysis where the labels are unbalanced. I’ve found the similar solution for Dataloader but haven’t found anything for torchtext. My torchtext code looks quite like this:
tokenize = lambda x: x.split()
TEXT = Field(sequential=True, tokenize=tokenize, lower=True, unk_token = None)
LABEL = Field(sequential=False, use_vocab=True, unk_token = None)
training, validation = TabularDataset.splits(
path = "./",
train = "training.csv", validation = "validation.csv",
format = "csv",
skip_header = True,
fields=[('text', TEXT), ('labels', LABEL)])
TEXT.build_vocab(training, max_size = None)
LABEL.build_vocab(training)
train_iter, val_iter, _ = data.BucketIterator.splits((
training, validation, _),
sort_key=lambda x: len(x.text),
batch_size = BATCH_SIZE,
device = device))
The sampler argument seems to be missing in torchtext documentation.
Could anybody please help me on this matter?
Thanks in advance.