How to load images from different folders in the same batch?

Hi,I need to load images from different folders,for example:batch_size=8,so I need to load 8 *3 images from 8 different folders,and load 3 images from each folder,all these images combined one batch.How to realize this?
I will be grateful for your help!

Hi

It would be enough to define your original torch.utils.data.DataSet subclass to handle 8 folders simultaneously. I write down one naive not complete example below.

Let the folder structure be

- train
    - folder_1
    - ...
    - folder_8

In the above setting, you can list all the image files in each folder by os.listdir('train/folder_1).
Also you can override torch.utils.data.DataSet class as below and pass your dataset instance to DataLoader setting batch_size=3

import os
from torch.utils.data import DataSet

class ImageDataSet(DataSet):

    def __init__(self, root='train', image_loader=None, transform=None):
        self.root = root
        self.image_files = [os.listdir(os.path.join(self.root, 'folder_{}'.format(i)) for i in range(1, 9)]
        self.loader = image_loader
        self.transform = transform
    def __len__(self):
        # Here, we need to return the number of samples in this dataset.
        return sum([len(folder) for folder in self.image_files])

    def __getitem__(self, index):
        images = [self.loader(os.path.join(root, 'folder_{}'.format(i), self.image_files[i][index])) for i in range(1, 9)]
        if self.transform is not None:
            images = [self.transform(img) for img in images]
        return images
7 Likes

Thanks for your help! I am a quite newer to pytorch,it really help me a lot!

@Harry-675
I’m really sorry it might not work. (I’m not sure):bowing_man:

If the above doesn’t work, try the below please.
More naive solution is that preparing DataSet and DataLoader for each folder. Then you loop over all the dataloaders like this Train simultaneously on two datasets if you don’t care the order of sampling in each folder.

So,

class ImageData(torch.utils.data.DataSet):
    def __init__(self, root='train/folder_1', loader=image_load_func, transform=None):
        self.root = root
        self.files = os.listdir(self.root)
        self.loader = loader
        self.transform = transform
    def __len__(self):
        return len(self.files)
    def __getitem__(self, index):
        return self.transform(self.loader(os.path.join(self.root, self.files[index])))

loader_1 = DataLoader(ImageData('train/folder_1'), batch_size=3)
...
loader_8 = DataLoader(ImageData('train/folder_8'), batch_size=3)

for batch in zip(loader_1, ..., loader_8):
    batch = torch.cat(batch, dim=0)
2 Likes

Hi, how to definite the image_load_func ? ( I just got started :confused:)

If anyone is looking for a different method, torchvision has a utility to accomplish this task cleanly.
https://pytorch.org/docs/stable/torchvision/datasets.html#torchvision.datasets.ImageFolder

There’s a small caveat with this though, images in a particular batch are not guaranteed to come from different classes.

It assumes the images are arranged in the following manner:

root_dir/
    class_1/
        001.png
        002.png
        ...
    class_2/
        001.png
        002.png
        ...
    ...
from torchvision import datasets, transforms
from torch.utils import data

dataset = datasets.ImageFolder(root = root_dir, 
                transform = transforms.ToTensor())

loader = data.DataLoader(dataset, batch_size = 8, shuffle = True)
1 Like

Thanks a lot, this worked for me!

@vaasudev96 How to use root for the custom dataset(ImageDataSet) defined by @crcrpar ?

loader_1 = DataLoader(ImageData.ImageFolder(root=‘train/folder_1’ , batch_size=3)