Torchtext alternative to TabularDataset, BucketIterator

Stephen_Fernandes · February 10, 2021, 6:56pm

utnil now ive been using the torchtext BucketIterator and TabularDataset for machine translations,
but the problem is the BucketIterator cannot be used with TPUs and it doesnt have a sampler and DistributedDataSampler cannot be used over that, also tried using it with Lightning but stuck to ony single GPU .

is there any better alternative DataLoaders for seq2seq translation task that can also batch data according to similar lengths like BucketIterator and that also can be compatible for distributed training on TPUs and GPUs.

i have a csv file with parallel texts to be used for seq2seq translations

mmg · June 21, 2021, 4:27am

This might help.

However, I haven’t used this on TPUs. Please let me know if it works (or not!)