How to write the DataLoader for pairs images?

Hi, everyone, I am pretty new to pytorch. Now I am stuck in how to write the dataloader for my task.

I have a haze image dataset. In the dataset, it has a folder of clear images, and a folder of haze images. All of these haze images are synthesized from the clear images, and one clear image will generate 30 haze images.

For example, there is a city.png image in clear folder, then there will have city_a1, city_a2, …, city_a30 images in haze folder. So the number of haze images are much larger than that of clear images.

I want to train a network to dehaze. Basically, I need the pairs image of (clear image, related haze image), that is the haze image has to match the clear image. So this is a one to many problem.

I only used datasets.ImageFolder before, but I do not think this will work for my task. So could anyone give some hint on how to design my own DataLoader to read pairs image? Thanks!

1 Like

We had a similar issue described here but with multiple target images for a single input image.
I think the code given in the other topic might be a good starter and you would have to swap the data paths for the target paths.

Depending on your use case, you could then sample a single data - target pair in __getitem__ or get all data images at once.

1 Like