I want to use the pretrained models from pytorch and evaluate them on imagenet val data. That should be fairly straightforward, but I am getting stuck on the dataloader.
I downloaded the imagenet and the folder structure that I have is like this:
I changed the path_to_imagenet_to /media/SSD2/ILSVRC/ like this
torchvision.datasets.ImageNet('/media/SSD2/ILSVRC/',split='val',download=False)
but I get this error:
RuntimeError: The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in /media/SSD2/ILSVRC/.
Then I tried using datasets.ImageFolder in the following way:
I don’t know how and where you’ve downloaded the ImageNet dataset, but on my system each class uses a separate subfolder, which is needed in order to find all classes.
E.g. the val folder contains:
I think @seyeeet probably downloaded the Imagenet dataset from kaggle: ImageNet Object Localization Challenge | Kaggle. It’s no longer possible to download it from the original place on the imagenet website, instead the website also only refers to the dataset that is hosted on kaggle. I also downloaded it there and I get the same error when loading it via the torchvision.datasets.ImageNet class.
Are we just using it wrong or has the layout of the ImageNet dataset changed? The error message mentions that the dataloder cannot find ILSVRC2012_devkit_t12.tar.gz - kaggle’s verison of the dataset actually does not contain such an archive.
I guess Kaggle might have changed the data layout and if so I would assume there would be PyTorch scripts to load this new dataset type.
Based on the previous output it seems as if the images are just stored in the test/train/val folders without any subfolders. In that case ImageFolder wouldn’t be compatible, since the class indices won’t be created. If so, I would also assume that a target file is provided additionally.
Thanks, you are right, there’s actually an example script in an official pytorch repo that shows how ImageNet data can be accessed by using ImageFolder, which works for the version of the dataset that’s available on kaggle: examples/imagenet at master · pytorch/examples · GitHub
That said, I haven’t used the script directly, I simply tried to reuse the code part that sets up the dataloaders which starts here examples/main.py at master · pytorch/examples · GitHub and that works for me.
Just make sure that you follow the setup instructions that are listed here examples/imagenet at master · pytorch/examples · GitHub. In particular, you have to execute the shell script, that’s mentioned in the last step, inside the val folder (ILSVRC/Data/CLS-LOC/val/). This makes sure that the val folder is structured in a way that can be understood by the ImageFolder class.
Meet the same problem and solve it with https://github.com/fh295/semanticCNN
Specifically, to make the val dataset have the same structure as train set, you should run the following commands. It will move all validation pictures to corresponding class subfolders.
cd val/
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash