SequenceTaggingDataset equivalent with the new torchtext version

giturra · June 2, 2023, 2:30pm

Hello everyone,

I am preparing a dataset for a sequence labeling task, but I noticed that torchtext API was updated, depreciating the legacy package. So my question is, is there a way of getting the separation dataset (train, valid, and test) similar to the legacy.datasets.SequenceTaggingDataset.splits function?

I was checking out the documentation, but I couldn’t find something similar to this.

Thank you for any advice or suggestion.

train_data, valid_data, test_data = legacy.datasets.SequenceTaggingDataset.splits(
    path="./",
    train="train.txt",
    validation="dev.txt",
    test="test.txt",
    fields=fields,
    encoding="utf-8",
    separator=" "
)