I have an ordered dataset (X, Y), where each data sample is a sliding window of X with length n. That is, the data used by the model is from X’ = [ X[0:n], X[1:n+1], … ]
Since both X and n are large I cannot fit X’ into memory. I tested an indexed dataset that constructs X’ in realtime:
class CustomDataset(torch.utils.data.Dataset):
def __init__(self, X, Y, n, **kwargs):
super(CustomDataset, self).__init__(**kwargs)
self.X = X
self.Y = Y
self.n = n
self.len = len(X) - n
def __getitem__(self, idx):
return self.X[idx: idx + self.n, :], ...
def __len__(self):
return self.len
While this does work, it is extremely slow. Any ideas how to improve this?
(A CPU only solution would be fine).