For a research project, I need to load a dataset of images, and numerical labels for drive and motor data. Each image is also labeled with a flag, and I need to be able to set which image flags to ignore and keep during runtime. Currently the data loading system was written and optimized for Caffe and uses the hdf5 format for extracting these files. During training, extracting and converting the hdf5 data is the main bottleneck. Because of the ignore flag feature, I was not able to use the default PyTorch Dataset to dynamically load the data, since a single call to the get item in the dataset must return an item at each call. I am looking to convert our current dataset into a format that would be optimized for loading in to PyTorch, that also supports loading a frame and skipping it if it has an ignore flag set. My current code for importing the dataset and using it for training are available here:
###Loading hdf5 Dataset w/ Ignore List
###Use of Dataset in Traning Code
What would be the best format to use for my particular use case? Are there any examples of setting up a similar type of dataset?