Does Concatenate Datasets preserve class labels and indices

Let’s say I have 2 image folder datasets and I want to concatenate them.

The first dataset has 100 images with 2 equal classes: “Dog” and “Cat” with class indices 0 and 1
The second dataset has 120 images with 3 equal classes: “Dog”, “Cat” and “Pig” with class indices 0, 1 and 2

When I concatenate the two datasets with torch.utils.data.ConcatDataset(), will I get a dataset with 90 dog and 90 cat images, and 40 pig images? Or does it treat the two dog and cat labels from the two datasets as different classes - so I actually end up with 5 classes? I didn’t find an easy way to check because the ConcatDataset class doesn’t have a classes or class_to_idx method.

And also what of the case if a third dataset had only classes “dog” and “pig” and I concatenated it with the first and second? Does ConcatDataset() map that pig should have class index 2 instead of 1?

ConcatDataset will not create a mapping, but just index the passed Datasets.
Each Dataset should make sure to yield the “right” labels.
E.g. in your second use case, you should make sure that dataset1 only yields samples with the class labels 0 and 1 (dog and cat), while dataset2 should only yield 0 and 2 (dog and pig).

The mapping is defined by your Dataset implementation.

Note that if you are using e.g. ImageFolder, the mapping will be created based on the folders, so I would not recommend using this approach, if your dataset folders do not contains the same classes.

I see, thanks. For a quick hack I ended up creating empty folders with all required classes in my image dataset root.

how to get the classes if i have three folder with same class inside the folder?

If you are using ImageFolder, you can access its dataset.class_to_idx attribute to see the mapping between the folders and the class indices.
I’m not sure if I misunderstand the question, but do all three folders contain images from the same class (one class only)?

no, there is two class…
Thank you for respond :smile:

Hi, what could be right approach to concatenate 2 datasets having different classes/labels. For example dataset D1 has folders for “cat” and “dog” whereas dataset D2 has folders like “elephant” and “lion”. Right now I created empty folders named “elephant” and “lion” in dataset D1 to preserve the labels and vice versa before using ImageFolder and ConcatenateDataset as bellow:
ds = torch.utils.data.ConcatDataset(
[datasets.ImageFolder(’./data/D1/train’, transform),
datasets.ImageFolder(’./data/D2/train’, transform)]
)

Your approach sounds alright, if you want to use the ImageFolder datasets.
Alternatively, you could write a custom Dataset and return the corresponding targets for both datasets, which wouldn’t rely on creating empty folders.

Thanks! It will be great if there is some pointer to some examples for how to do it as I am new to pytorch.

This tutorial shows how to write a custom Dataset and might be helpful. :wink: