Get filename when using ImageFolder

cmplx96 · May 12, 2019, 6:24pm

Hi,

I am using ImageFolder to load a test dataset to submit to Kaggle.
I need the filename because it contains the id.
Is there any way I can get this with ImageFolder?

Thanks!

rwightman · May 12, 2019, 11:50pm

Since there shouldn’t be any shuffling when you run through your dataset you can just match up the filenames in the imgs member of the dataset after you’ve run inference. It’s less messy than trying to return the filename as an extra element in the getitem tuple.

I usually add a member like the one below to my dataset class, you could also implement as a free function and direcly access the imgs member from outside the dataset class.

    def filenames(self, indices=[], basename=False):
        if indices: 
            # grab specific indices
            if basename:
                return [os.path.basename(self.imgs[i][0]) for i in indices]
            else:
                return [self.imgs[i][0] for i in indices]
        else:
            if basename:
                return [os.path.basename(x[0]) for x in self.imgs]
            else:
                return [x[0] for x in self.imgs]

A very minimal example that zips the filenames with the labels from an inference loop here: https://github.com/rwightman/pytorch-image-models/blob/master/inference.py

nofreewill · February 10, 2020, 1:17pm

dataset = ImageFolder(…)
dataset.imgs

dataset.imgs contains all the filenames (dataset[0] → dataset.imgs[0], …)