Custom class names with ImageFolder or DatasetFolder not updating labels on custom dataset

redtailedhawk · December 29, 2019, 2:14am

I created a custom Dataset, and in my init changed the classes to what I wanted them to be by calling a custom _find_classes method. (I wanted to use subfolders, and concatenate their names with the parents)This took my class count from something like 30 up to 964.

class MyDataset(Dataset):
    def __init__(self, image_path, transform=None):
        super(MyDataset, self).__init__()
        self.data = datasets.ImageFolder(image_path, transform)
        self.data.classes, self.data.class_to_idx = self._find_classes(image_path)

    def _find_classes(self, dir):
        # Custom labels

When I call class_to_idx, I have 964 different classes as expected, but when I create a DataLoader, it is giving me the old count of labels. If I print labels, nothing is higher than 29, or the old mapping to the parent folders.

data = DataLoader(data, batch_size=32, shuffle=True, num_workers=0)
data_iter = iter(train_dataloader)
images, labels = data_iter.next()

print(labels)

out:
tensor([ 0,  2, 19, 19, 29, 20,  3, 20, 22, 27,  1, 18,  4, 20, 17,  1,  3, 23,
        20, 18,  6, 29,  6, 18,  9, 24,  8, 29, 13, 19, 21, 14])

And if I print data.targets, it shows me the old mapping, before I tried to customize my labels.

It seems like I changed the classes on my data, but not my targets. What is a better way for me to do this more efficiently? It’s clear that my _find_classes has nothing to do with the _find_classes being called by ImageFolder, and I’m changing classes after the dataset has been created. What can I do?

ptrblck · December 29, 2019, 9:07am

Answered here.