Range of ImageNet 2012 training set labels

fourleafclover · January 5, 2023, 6:13pm

I downloaded the ImageNet 2012 training and validation datasets and use them for training a ResNet-50 model. At first, during training a CUDA device-side assert error was triggered and after some web-search, I found its cause: the targets that are provided by my training set dataloader are in the range of [1, 1000], while nn.CrossEntropyLoss, which I use for training, expects the targets to be in [0, 999]. Simply subtracting 1 from all the targets provided by the dataloader of the training set (as suggested here) solved the problem.

While this is nice, I would like to understand a couple of things that are still mysterious to me:

why does the training set data loader provide targets that are in the incorrect range?
why does only the dataloader for the training set provide targets that are in the incorrect range? For the targets provided by the validation set, subtracting one is not necessary.

As far as I understood, the range of the targets is defined by how the DatasetFolder “builds” the dataset. More specifically, this depends on the make_dataset() function, which in turn calls find_classes() to get a dictionary class_to_idx that maps WordNet IDs (e.g. n01440764) to indices.

When I use find_classes() and pass to it the directory where I unpacked the ImageNet training set, the returned class_to_idx dictionary has values in the range of [0, 999], which seems right.

Does someone know what I am missing?
Thanks in advance

ptrblck · January 5, 2023, 11:15pm

I guess your root folder contains an unneeded and unexpected folder since the class index mapping will be performed based on the number of folders. torchvision.datasets.ImageNet will initialize its base class here which is ImageFolder and which will eventually call into find_classes here enumerating the folders starting at class index 0.
Same as above: check the root folder and make sure it contains 1000 subfolders.

fourleafclover · January 6, 2023, 10:11am

This very much makes sense, thanks a lot for your swift reply!