Hello everyone,
I am preparing a dataset for a sequence labeling task, but I noticed that torchtext API was updated, depreciating the legacy
package. So my question is, is there a way of getting the separation dataset (train, valid, and test) similar to the legacy.datasets.SequenceTaggingDataset.splits
function?
I was checking out the documentation, but I couldn’t find something similar to this.
Thank you for any advice or suggestion.
train_data, valid_data, test_data = legacy.datasets.SequenceTaggingDataset.splits(
path="./",
train="train.txt",
validation="dev.txt",
test="test.txt",
fields=fields,
encoding="utf-8",
separator=" "
)