I am loading many small .npy array files that each contain a variable-length sequence of elements of the shape
T is a particular sequence length and
N is the feature size (the same for all arrays).
I am currently loading each .npy array file, converting it to a tensor, and then appending it to a python list. This is all happening in the
__init__ of a
Is there a more efficient way to be loading the data in my scenario? If my sequence lengths were constant, I could pre-allocate a tensor of the appropriate size and then do in-memory slicing on them directly from the loaded data. I have also heard about loading the data ‘lazily’ in the
__getitem__ in cases where the data may be too large to fit in memory, but would this make a difference in terms of the time it takes to load the data? Also, is my current approach amenable to multi-processing using num_workers?