[Torchtext] Delete rows with specific label

Hello,

I’m currently using the SNLI dataset with the corresponding split method from torchtext:

datasets.nli.NLIDataset.splits(...)

I would like to filter out examples which have a specific label. My idea was to use a Field object that filters during the preprocessing step, but did not find any similar approaches online. I’m unsure wether removing a complete row is even possible and if it is, how the label-Field object can notify the text-Field object.

Is this something torchtext is capable to do, or should I filter beforehand?

Thanks in advance!

The pre-processing step only helped but not solve this problem. This specific splits-method returns NLIDatasets with a filter attribute, kicking out a specific kind of label ( ‘-’ ). The right way to do would possibly involve using TabularDataset and defininig fields similar to the NLIDatasets ones, but since ‘.jsonl’ is not supported and an easy hacky way seems to work, I will use this version for now.

class_labels = {'contradiction': 0, 'neutral': '-', 'entailment': 1, '-': '-'}
label_tokenizer = lambda x: class_labels[x]
label_field = data.Field(sequential=False, batch_first=True, preprocessing=label_tokenizer)

https://pytorch.org/text/_modules/torchtext/datasets/nli.html