How to load images from different folders in the same batch?

Hi,I need to load images from different folders,for example:batch_size=8,so I need to load 8 *3 images from 8 different folders,and load 3 images from each folder,all these images combined one batch.How to realize this?
I will be grateful for your help!


It would be enough to define your original subclass to handle 8 folders simultaneously. I write down one naive not complete example below.

Let the folder structure be

- train
    - folder_1
    - ...
    - folder_8

In the above setting, you can list all the image files in each folder by os.listdir('train/folder_1).
Also you can override class as below and pass your dataset instance to DataLoader setting batch_size=3

import os
from import DataSet

class ImageDataSet(DataSet):

    def __init__(self, root='train', image_loader=None, transform=None):
        self.root = root
        self.image_files = [os.listdir(os.path.join(self.root, 'folder_{}'.format(i)) for i in range(1, 9)]
        self.loader = image_loader
        self.transform = transform
    def __len__(self):
        # Here, we need to return the number of samples in this dataset.
        return sum([len(folder) for folder in self.image_files])

    def __getitem__(self, index):
        images = [self.loader(os.path.join(root, 'folder_{}'.format(i), self.image_files[i][index])) for i in range(1, 9)]
        if self.transform is not None:
            images = [self.transform(img) for img in images]
        return images

Thanks for your help! I am a quite newer to pytorch,it really help me a lot!

I’m really sorry it might not work. (I’m not sure):bowing_man:

If the above doesn’t work, try the below please.
More naive solution is that preparing DataSet and DataLoader for each folder. Then you loop over all the dataloaders like this Train simultaneously on two datasets if you don’t care the order of sampling in each folder.


class ImageData(
    def __init__(self, root='train/folder_1', loader=image_load_func, transform=None):
        self.root = root
        self.files = os.listdir(self.root)
        self.loader = loader
        self.transform = transform
    def __len__(self):
        return len(self.files)
    def __getitem__(self, index):
        return self.transform(self.loader(os.path.join(self.root, self.files[index])))

loader_1 = DataLoader(ImageData('train/folder_1'), batch_size=3)
loader_8 = DataLoader(ImageData('train/folder_8'), batch_size=3)

for batch in zip(loader_1, ..., loader_8):
    batch =, dim=0)

Hi, how to definite the image_load_func ? ( I just got started :confused:)

If anyone is looking for a different method, torchvision has a utility to accomplish this task cleanly.

There’s a small caveat with this though, images in a particular batch are not guaranteed to come from different classes.

It assumes the images are arranged in the following manner:

from torchvision import datasets, transforms
from torch.utils import data

dataset = datasets.ImageFolder(root = root_dir, 
                transform = transforms.ToTensor())

loader = data.DataLoader(dataset, batch_size = 8, shuffle = True)

Thanks a lot, this worked for me!

@vaasudev96 How to use root for the custom dataset(ImageDataSet) defined by @crcrpar ?

loader_1 = DataLoader(ImageData.ImageFolder(root=‘train/folder_1’ , batch_size=3)