I’m training an auto encoder network to remove “dirt” from images. For this I have two folders:
clean. Currently I load the data like this
dirty_data = torchvision.datasets.ImageFolder(root='data/dirty', transform=transform) clean_data = torchvision.datasets.ImageFolder(root='data/clean', transform=transform) train_dirty_loader = torch.utils.data.DataLoader(dirty_data, batch_size=BATCH_SIZE, num_workers=0, shuffle=False) train_clean_loader = torch.utils.data.DataLoader(clean_data, batch_size=BATCH_SIZE, num_workers=0, shuffle=False)
There are images in the
clean folders with the same name (same images, with and without “dirt”). This method of having two separate loaders does work, but comes with a number of issues
- It’s ugly
- I can’t use
shuffle=True, since this would make
train_clean_loaderout of “sync” with each other (training depends on that the clean and dirty images comes in correct order now).
- I can’t split the dataset using the
random_splitfunction, for same reason as above.
How should I solve this?