Custom datatype for many images to one label

I have a dataset comprising of human labelled facial images of 40 participants in the following file structure:

root/participant1/video1/frame0001.jpg

root/participant1/video2/frame0001.jpg

root/participant2/video2/frame0001.jpg
etc

As the video was annotated at 20-second intervals, many of these images can refer to one label within an xlsx file with the columns:

-participantid
-videoid
-segment index (which 20s interval the label belongs)
-target variable.

I am looking for an efficient PyTorch dataset implementation to load the dataset images so that I can do leave-one-out cross-validation across participants i.e. train a CNN on all videos for all subjects with the exception of the videos of one participant and repeat until all participants have been left out once.

The issue I currently have is constructing a custom data structure that deals with many images to one label as well as allowing for the desired cross-validation. What is the best way to implement this in PyTorch?

I’m not sure if you really want to design the Dataset to return multiple images and a single target or if you rather want to repeat the target for all images in the same interval.
Both approaches would be possible by writing a custom Dataset as described in this tutorial.
Depending how the targets are stored, you might want to create them in the Dataset.__init__ method and lazily load the data (and index the target) in __geittem__.

I would recommend to use some methods from sklearn.model_selection and split the dataset indices according to your groups. E.g. GroupKFold might be useful.

Once you have these split indices, you could wrap your datasets into Subsets with these indices and perform your cross validation.