I created a custom Dataset, and in my init changed the classes to what I wanted them to be by calling a custom _find_classes method. (I wanted to use subfolders, and concatenate their names with the parents)This took my class count from something like 30 up to 964.
class MyDataset(Dataset):
def __init__(self, image_path, transform=None):
super(MyDataset, self).__init__()
self.data = datasets.ImageFolder(image_path, transform)
self.data.classes, self.data.class_to_idx = self._find_classes(image_path)
def _find_classes(self, dir):
# Custom labels
When I call class_to_idx, I have 964 different classes as expected, but when I create a DataLoader, it is giving me the old count of labels. If I print labels, nothing is higher than 29, or the old mapping to the parent folders.
data = DataLoader(data, batch_size=32, shuffle=True, num_workers=0)
data_iter = iter(train_dataloader)
images, labels = data_iter.next()
print(labels)
out:
tensor([ 0, 2, 19, 19, 29, 20, 3, 20, 22, 27, 1, 18, 4, 20, 17, 1, 3, 23,
20, 18, 6, 29, 6, 18, 9, 24, 8, 29, 13, 19, 21, 14])
And if I print data.targets, it shows me the old mapping, before I tried to customize my labels.
It seems like I changed the classes on my data, but not my targets. What is a better way for me to do this more efficiently? It’s clear that my _find_classes has nothing to do with the _find_classes being called by ImageFolder, and I’m changing classes after the dataset has been created. What can I do?