How to make torchtext input data from scratch

lesshaste · December 3, 2020, 8:01pm

I am trying to modify https://github.com/bentrevett/pytorch-seq2seq/blob/master/5%20-%20Convolutional%20Sequence%20to%20Sequence%20Learning.ipynb to use my own training data.

The code makes variables train_iterator, valid_iterator, test_iterator using the function BucketIterator.splits. The type of train_iterator is torchtext.data.iterator.BucketIterator.

In my case I have two long strings made up of sentences, one in English and one in a dialect of English that is a translation of the one in plain English. Let us call these string1 and string2. How can I make the variables train_iterator, valid_iterator, test_iterator from these two strings?

mmg · June 21, 2021, 4:41am

This might help. It reads a CSV and builds ‘from scratch’