Custom Image Dataset good practice? (Working Memory Issues)

Hello everyone!
I am creating my own custom image dataset using torchs Dataset class.
So far, I iterate through all .jpg files in a given folder and store them as a list by appending.
This costs a lot of working memory + it takes ages to load the dataset.
I was wondering what a smart way is to load the images? What is considered good practice when working with a lot of images?

I was thinking about storing the paths where the images are (as .csv / .json) and only load the images in the def getitem(self, idx): method, when they are actually needed.

What’s the most efficient way?
Thanks for any suggestions!

The second one is actually the way that -for example- ImageFolder dataset class load the images. So, the best practice is load the path directory where your images are ( for example with os.listdir() ) , and load the image in __getitem__ method

1 Like