Get two samples of different category from one dataset

EEQ · August 29, 2022, 6:01am

I’m trying to define a Dataset/Dataloader for an image style transfer network. I have a dataset of images grouped by styles, and I want each sample from this dataset to consist of two images, one for style, the other for content. My first idea was to implement a Dataset with something like this in __init__:

n = len(images)
itr_style = random.shuffle([i for i in range(n)]))
itr_content = random.shuffle([i for i in range(n)]))

and this in __getitem__:

return (images[itr_style[index]], images[itr_content[index]])

Which is probably not the most efficient implementation, and I also need to make sure that:

The two images don’t come from the same style
The dataset re-shuffles every epoch

What is the best way to implement this Dataset?

nivek · August 29, 2022, 2:09pm

If I understand correctly, you should be able to compute all valid combinations of indices (i, j) ahead of time, where i is the index for an image for style and j is the index for an image for content. Basically, you can do something like:

valid_combinations = []
for i in range(n):
    for j in range(n):
        if _:  # images don't come from the same style
            valid_combinations.append((i, j))

Your __getitem__ can read the indices from valid_combinations and then get the corresponding images.