How to concat datasets for LSTM properly?

Mango_Freeze · August 24, 2020, 12:05am

I have a list of files that contains equal length data. To simplify let’s say the two files contain
file1: [1, 2, 3, 4, 5]
file2: [6, 7, 8, 9, 10]

Currently I create a dataset for each file and use ConcatDataset() in my dataloader. Let’s assume a sequence length of 2.
Now if I use a batch size of 2, will the dataloader return [[1, 2], [3, 4]] or [[1, 2], [6, 7]]?

My desired batch is [[1, 2], [6, 7]], so that I can feed the LSTM with proper start point and the next batch [[3, 4], [8, 9]] will make sense for the LSTM state.

How can I achieve that?