Generating text/csv file for image path and mask path for semantic segmentation

I have a huge set of images(60k) and masks(60k) that need to be loaded into a PyTorch dataloader for semantic segmentation.

Directory Structure:

 - Segmentation
       -images
           -color_left_trajectory_3000_00001.jpg
           -color_left_trajectory_3000_00002.jpg
           ...
       -masks
           -color_segmentation_3000_00001.jpg
           -color_segmentation_3000_00002.jpg
           ...

I want to know the most efficient way to load these into a dataloader in Pytorch. I was thinking of generating a csv file with the paths to images and masks. How will I go about generating the same? Any other suggestions are appreciated!

In my use cases I lazily use glob.glob to get a list of images and then use string manipulation (.replace in particular) to compute the corresponding mask paths. This has some hazard for capitalization and .jpg vs. .jpeg extensions, but works well if you have relatively tidy path names.

A more modern way is to use pathlib.Path, maybe with .iterdir() instead of the (also available) .glob(...). If you care about cleanliness of your code, that might be worth looking at.

Best regards

Thomas

Okay makes sense, will check it out. Thank you!