Better memory efficiency vs speed for sequence datasets?

I have an ordered dataset (X, Y), where each data sample is a sliding window of X with length n. That is, the data used by the model is from X’ = [ X[0:n], X[1:n+1], … ]

Since both X and n are large I cannot fit X’ into memory. I tested an indexed dataset that constructs X’ in realtime:

class CustomDataset(torch.utils.data.Dataset):
    def __init__(self, X, Y, n, **kwargs):
        super(CustomDataset, self).__init__(**kwargs)
        self.X = X
        self.Y = Y
        self.n = n
        self.len = len(X) - n

    def __getitem__(self, idx):        
        return self.X[idx: idx + self.n, :], ...

    def __len__(self):
        return self.len

While this does work, it is extremely slow. Any ideas how to improve this?

(A CPU only solution would be fine).