Help with torchtext LanguageModelingDataset and BPTTIterator

nlpdl · July 14, 2018, 12:20pm

I would like to ask if anyone has worked with the pytorch LanguageModelingDataset and BPTTIterator

I’m trying to read in raw text to train a char-rnn

my text file has sentences of varying lengths

The sky is blue today.
I feel like going for a walk and ice cream.
The day is gloomy, I am staying inside and napping.

I tried

trn_data, vld_data, tst_data = datasets.LanguageModelingDataset.splits(TEXT, path="<path to file>",
    train='training.txt', validation="validation.txt", test='testing.txt')

but I get the error:

TypeError: splits() got multiple values for keyword argument 'path'

and then for the iterator:

train_iter, vld_iter, test_iter = data.BucketIterator.splits((trn_data,vld_data,tst_data),
                                                             batch_size=batchsize,
                                                             device=-1,
                                                             bptt_len=sequence_length,
                                                             shuffle=True,
                                                             repeat=False)

Thanks for any help you can provide, always appreciate it.

tom · July 16, 2018, 7:01am

For the error: the TEXT argument is in the position of path parameter and you pass a keyword path in addition.

Best regards

Thomas