I have a dataset comprising of human labelled facial images of 40 participants in the following file structure:
root/participant1/video1/frame0001.jpg
…
root/participant1/video2/frame0001.jpg
…
root/participant2/video2/frame0001.jpg
etc
As the video was annotated at 20-second intervals, many of these images can refer to one label within an xlsx file with the columns:
-participantid
-videoid
-segment index (which 20s interval the label belongs)
-target variable.
I am looking for an efficient PyTorch dataset implementation to load the dataset images so that I can do leave-one-out cross-validation across participants i.e. train a CNN on all videos for all subjects with the exception of the videos of one participant and repeat until all participants have been left out once.
The issue I currently have is constructing a custom data structure that deals with many images to one label as well as allowing for the desired cross-validation. What is the best way to implement this in PyTorch?