If I had a bunch of different image datasets I would just concatenate using “ConcatDataset” from torch.utils.data. For multiple times-series datasets, I cant concatenate because they need a rolling window and cant just continuously index from one datasets to another.
Is it possible to come up with a solution using custom dataloader or should I just iterate through a different pandas dataframe after each training episode?
Here is my custom dataloader that works with a 1D CNN with one time-series datasets at a time.
class MyDataset(Dataset):
def __init__(self, data, window):
self.data = data
self.window = window
print(data.tail())
self.xData = torch.FloatTensor(data[['data_1','data_2','data_3']].values.astype('float'))
self.yData = torch.FloatTensor(data['labels'].values.astype('int'))
def __len__(self):
return len(self.data) - self.window
def __getitem__(self, index):
target = self.yData[index+self.window]
data_val = self.xData[index:index+self.window].reshape(in_channel, window_size)
return data_val, target
# split data into train test set
df_train, df_test = train_test_split(df, test_size=0.3, shuffle=False)
# send our pandas dataframe with data to our custom data class
train_dataset = MyDataset(df_train, window_size)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_dataset = MyDataset(df_test, window_size)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True)
A method that currently works but I don’t like doing.
# dictionary full of multiple pandas dataframes
count += 1
str_txt = list(dictionary.data.keys())[count]
data = dictionary.data[str_txt]
if count >= len(list(dictionary.data.keys()):
count = 0