Does Concatenate Datasets preserve class labels and indices

ok so if I want to:

I’d lake to take the union for the labels and being relabeled from from scratch from 0 to len_dataset1+len_dataset2-1.

Then I have to implement my own “version” of union/concate/merge data set that takes into account this. Likely re-implementing the __getitem__(idx: int): function to something like this:

def __getitem__(self, index: int):
    # leave the sampled labels of data set 1 as is
    img, target = self.mnist[index], int(self.mnist.targets[index])

    # to the sampled labels of data set 2 add the number of
    img, target = self.cifar10[index], int(self.cifar10.targets[index]) + len(self.mnist)
    return ...

darn this isn’t quite right…based on the index the data set I implement should know what is the right mapping…also this doesn’t work for an arbitrary union of data sets of course…then likely one needs to bisect function to find in which interval the idx is in the you know how many lens of data sets you need to add…

I think the easiest is to wrap your normal data set into learn2learn’s metadataset then pass it to their Union data set.

        train = torchvision.datasets.CIFARFS(root="/tmp/mnist", mode="train")
        train = l2l.data.MetaDataset(train)
        valid = torchvision.datasets.CIFARFS(root="/tmp/mnist", mode="validation")
        valid = l2l.data.MetaDataset(valid)
        test = torchvision.datasets.CIFARFS(root="/tmp/mnist", mode="test")
        test = l2l.data.MetaDataset(test)
        union = UnionMetaDataset([train, valid, test])
        assert len(union.labels) == 100
class UnionMetaDataset(MetaDataset):
    """
    **Description**
    
        Takes multiple MetaDataests and constructs their union.
    
        Note: The labels of all datasets are remapped to be in consecutive order.
        (i.e. the same label in two datasets will be to two different labels in the union)
    
        **Arguments**
    
        * **datasets** (list of Dataset) -  A list of torch Datasets.

link: learn2learn.data - learn2learn

actually it’s easier to:

Actually, it’s likely easier to preprocess the data points indices to map to the label required label (as you loop through each data set you’d know this value easily and keep a single counter) – instead of bisecting.

1 Like