Loading data for multilabel classification, too large for memory

I have one folder containing ~30k images. There are 19 classes, and an image can belong to n of them (multilabel classification). Therefore, i cant use the ImageFolder function. I also need the name of each image-file, since the filename contains the ID of the image and there is a corresponding Dataframe including the classes of each ID. What is the most efficient way loading this data?

I’ve found a solution by implementing your own version of torch.utils.data.Dataset. A detailed tutorial here :slight_smile: