What's the best practice for loading sequence data

I’d like to get the item like an image sequence(TxHxWx3), but using data loader with multiple workers only parallelize the reading process on the batches N, while not on time step T. Is there any better method to parallelize on T?

I’ve tried to write a multiple process __getitem__ function, while it raises an error as “daemonic processes are not allowed to have children”

the bottleneck is usually the available number of CPU cores (usually 4 to 20 cores are available).
Trying to parallelize over both N and T through N * T workers wont help.

You probably want to just load one sample (with TxHxWx3) per worker thread.