ImageFolder data shuffle?

Hello everybody,

Does any function exists in pytorch that shuffles data before DataLoader()? I spent some time in search and still can not normally shuffle my output from ImageFolder() :face_with_raised_eyebrow:

Thanks,
Anton

1 Like

Do you really need to shuffle the folders or do you just want to shuffle the data returned from the DataLoader?
If the latter, you could just set shuffle=True in the DataLoader.

I need to shuffle a data. I do the train_test_split for ImageFolder() data manually and some classes do not fall into the train set. Set shuffle=True in DataLoader() is not a solution :cry:
Is it a best solution to read all the images with numpy and then transform it to DataLoader objects? I think I’m messed up with data transfromations in pytorch.

Ah ok, you could use a SubsetRandomSampler.
@kevinzakka created a nice example here.

Alternatively you could specify your own ImageFolder which is also completly straight forward.

An example is given below and it should work quite simple if you shuffle imgs in the __init__. This way you can also do some fancy preprocessing on numpy etc by specifying your own load-funktion and pass it to loader

class ImageFolder(data.Dataset):
    """Class for handling image load process and transformations"""

    def __init__(self, image_path, options, transform=None, return_paths=True,
                 loader=default_loader_unaligned):
        """
        Function to create the dataset and initialize the class variables
        :param image_path: path containing image-files
        :param options: class containing all options (args of BaseOptions or subclass)
        :param transform: transformation to apply on the Image after loading it
        :param return_paths: Boolean, True if paths should be returned alongside images , False if only images
        :param loader: function to load and resize images
        """
        imgs = make_dataset(image_path)
        if len(imgs) == 0:
            raise(RuntimeError("Found 0 images in: " + image_path + "\n"
                                                                    "Supported image extensions are: " + ",".join(IMG_EXTENSIONS)))

        self.root = image_path
        self.imgs = imgs
        self.transform = transform
        self.return_paths = return_paths
        self.loader = loader
        self.options = options

    def __getitem__(self, index):
        """
        Function to get certain item in dataset
        :param index: index of dataset-list
        :return: item in dataset with given index
        """
        path = self.imgs[index]
        img = self.loader(path, self.options.imageSize, self.options.inputNc)
        if self.transform is not None:
            img = self.transform(img)
        if self.return_paths:
            return img, path
        else:
            return img

    def __len__(self):
        """Function to get number of items in dataset"""
        return len(self.imgs)
1 Like

Can you share what is default_loader_unaligned ?

Thanks for all of the replies,
Anton

In this case it is simply a function to load an image:

def default_loader_unaligned(path):
    """
    Helper function to load an Image with PIL
    :param path: path of image file
    :return: loaded image in RGB mode as PIL Image
    """
    return Image.open(path).convert('RGB')

You can modify this function however to perform some transformations and if you need to load aligned image data it should be modified to return a tuple of image and label

Thanks, again. I will try your solution ASAP.

1 Like

Thanks. That was really helpful reply. I kindly ask you to share some more functions to make this implementation more complete if it is possible. I think people with very beginner level will really appreciate that. And I will mark this solution as a problem solver. :star_struck:

I quickly created a gist. However the code is not brandnew so i cannot guarantee it to work with the latest pytorch version. Maybe you have to do some minor changes to get it run.

I am also unsure about the imports but I think I covered everything I used (and maybe a bit too much)

1 Like

Did you get the code running with Pytorch 0.4?
If so, could you tell me whether there were major changes necessary to do so?

I didn’t use your whole code. Just copied your approach to add some custom solutions in ImageFolder() source code. In my case, I added numpy.shuffle() in ImageFolder() and changed a little bit default loader.

1 Like