I have the following question.
- What is the current most popular to store a large dataset (more than 30GB) using PyTorch?
- Why do I see people store a large dataset into multiple hdf5 files instead of just one? Will it increase efficiency?
- And how to load multiple hdf5 files efficiently in data loading? Basically, how to write the init() and get_item(). I see the post here DataLoader, when num_worker >0, there is bug, but it only talked about a single hdf5 file.