Create a dataset from different sources

hect1995 · April 7, 2020, 1:23pm

I have several CSV with temporal data files and from each CSV files I want to get minibatches (i.e. size 200) of sequential data. How can I incorporate all this CSV files in a PyTorch dataset while not mixing information from one CSV into another?

ptrblck · April 8, 2020, 8:01am

You could store all file paths to the .csv files in your Dataset's __init__ method and load each file separately using a module operation in __getitem__.
E.g. if file1 contains 100 sliding windows, you could check, if 100 <= index < 200 and load the window from the second file.

Have a look at this small example for a sliding window approach.