I would like to ask if anyone has worked with the pytorch LanguageModelingDataset and BPTTIterator
I’m trying to read in raw text to train a char-rnn
my text file has sentences of varying lengths
The sky is blue today.
I feel like going for a walk and ice cream.
The day is gloomy, I am staying inside and napping.
I tried
trn_data, vld_data, tst_data = datasets.LanguageModelingDataset.splits(TEXT, path="<path to file>",
train='training.txt', validation="validation.txt", test='testing.txt')
but I get the error:
TypeError: splits() got multiple values for keyword argument 'path'
and then for the iterator:
train_iter, vld_iter, test_iter = data.BucketIterator.splits((trn_data,vld_data,tst_data),
batch_size=batchsize,
device=-1,
bptt_len=sequence_length,
shuffle=True,
repeat=False)
Thanks for any help you can provide, always appreciate it.