What are some common datasets for nlp equivalent to mnist or cifar for vision

We have torchvision for vision datasets but is there an equivalent libray for nlp tasks?
Something that contains datasets that vary in complexity from simpler to harder ones, like for instance in vision MNIST → CIFAR → ImageNet?

Hi Kirk - taking a look at the datasets withing torchtext woul give you a good indication!

In my mind, things like AG News or IMDB datasets would be fairly popular like the MNIST datasets. But it really depends on the task you’re trying to achieve.

1 Like