Sampler missing in torchtext

Shanto_Islam · May 31, 2020, 12:05pm

I am currently working on sentiment analysis where the labels are unbalanced. I’ve found the similar solution for Dataloader but haven’t found anything for torchtext. My torchtext code looks quite like this:

tokenize = lambda x: x.split()

TEXT = Field(sequential=True, tokenize=tokenize, lower=True, unk_token = None)
LABEL = Field(sequential=False, use_vocab=True, unk_token = None)

training, validation = TabularDataset.splits(
                        path = "./",
                        train = "training.csv", validation = "validation.csv",
                        format = "csv",
                        skip_header = True,
                        fields=[('text', TEXT), ('labels', LABEL)])

TEXT.build_vocab(training, max_size = None)
LABEL.build_vocab(training)

train_iter, val_iter, _ = data.BucketIterator.splits((
                    training, validation, _),
                    sort_key=lambda x: len(x.text),
                    batch_size = BATCH_SIZE, 
                    device = device))

The sampler argument seems to be missing in torchtext documentation.

Could anybody please help me on this matter?

Thanks in advance.