NumPy Dataset - Single large file using mmap, small files per example, or something else?

I’ve got all my data in NumPy array files. Which way of storing the data would be most efficient for PyTorch, particular in regard to multiple workers and shuffling? Would PyTorch’s approach with multiple workers and shuffling make the memmap reading inefficient? Is the overhead of opening tons of tiny files worse? Will the difference between the two approaches be significant? Thank you!