Hi!
I’m training an auto encoder network to remove “dirt” from images. For this I have two folders: dirty
and clean
. Currently I load the data like this
dirty_data = torchvision.datasets.ImageFolder(root='data/dirty', transform=transform)
clean_data = torchvision.datasets.ImageFolder(root='data/clean', transform=transform)
train_dirty_loader = torch.utils.data.DataLoader(dirty_data, batch_size=BATCH_SIZE, num_workers=0, shuffle=False)
train_clean_loader = torch.utils.data.DataLoader(clean_data, batch_size=BATCH_SIZE, num_workers=0, shuffle=False)
There are images in the dirty
and clean
folders with the same name (same images, with and without “dirt”). This method of having two separate loaders does work, but comes with a number of issues
- It’s ugly
- I can’t use
shuffle=True
, since this would maketrain_dirty_loader
andtrain_clean_loader
out of “sync” with each other (training depends on that the clean and dirty images comes in correct order now). - I can’t split the dataset using the
random_split
function, for same reason as above.
How should I solve this?