How to create a is_valid_file function for ImageFolder

Nathan_Barry · April 10, 2020, 12:37am

I created my first CNN using Microsoft’s Dogs vs Cats dataset. Both times when I’ve ran it, the training stops because of a corrupt image. I read the ImageFolder docs and see that is_valid_file is a function, but I have no idea how to create a function that would be able to do that. Can someone give me code that can do that? Thanks.

ptrblck · April 10, 2020, 5:04am

Once you’ve created the Dataset, you could iterate it and use the index (and thus file path) to either fix the file (re-download, recreate) or remove it from the folder.

Also, what error message are you getting?

M_Djamaluddin · July 1, 2020, 1:25pm

I have question why in torchvision 0.2.1 when I supply is_valid_file function there is error : “init() got an unexpected keyword argument ‘is_valid_file’”? I can see from doc the argument exist.

ptrblck · July 1, 2020, 5:16pm

torchvision 0.2.1 was released on April 24, 2018 based on the release notes, while is_valid_file was added on April 25 2019 in this PR (so a year later).

Where did you find the docs which mention it in 0.2.1?

M_Djamaluddin · July 2, 2020, 12:51am

My bad. I have 0.2.1 in my system all this time, and I thought it already includes that missing argument. I just reinstall pytorch and torchvision, and there is no problem anymore.
Thanks for your time.

Pyroka · December 16, 2020, 10:26am

What does that mean? Can you show a sample code?

If i iterate my dataset i already get the error (caused by a corrput jpg file in my image folder) before i can do something in the for loop.

ptrblck · December 16, 2020, 10:29am

That’s the idea to debug the issue further. Once you’ve isolated the index, where the dataset fails, check the corresponding file.

A code snippet would be:

for idx in range(len(dataset)):
    try:
        batch = dataset[idx]
    except Exception as e:
        print(idx)