How to access sliding window data with only one index in the __getitem__ method?

Hi I am working on a continuous classification problem. My time series data is of the following shape
x: (n_trials, n_features, n_time)
y: (n_trials, ).

The trials are of different length, so the n_windows is different for each trial.

Then, I applied sliding window. Now, the data is of this 4D shape:
x: (n_trials, n_windows, n_features, window_length).
y: (n_trials, n_windows)

Not all trials will be used, and I will be passing a list eligible of trial number.

There are three things I would like to ask regarding the Pytorch Dataset class:

  1. The getitem method only takes in one index, and my objective is to return one window of 2D data shaped (n_features, window_length), and one integer for y. I hope that the index would refer to a specific window number. However, to access the window, I must get into the corresponding trial first.
    Also, the input trial number may not always start from 1. With only one index, how can I get to the specific window I want?

  2. Is the logic in (1) correct? Is it a must to return one window at a time? Can I put in an entire trial instead of a specific window? My eventual goal is to make a prediction every 40ms, with a window size of 500ms.

Thank you!

If I understand you correctly, you want to have a single (sometimes “linear” in other context) index into (n_trials, n_windows). The easiest likely is to just loop over the eligible trials/windows at the beginning and build a list of (idx_trial, id_window). Then you can use the index passed into __getitem__ to get the corresponding (idx_trial, id_window) and then the data from that.
If the loop takes too much time to do at the start of the training, an alternative is to do this once in preprocessing and store the result (I would probably use Pandas and a CSV file, but you could also use Python’s pickle or even torch.save).

Best regards

Thomas

Hi Tom

Thanks so much for your help! Really appreciate it.

1 Like