How to create a is_valid_file function for ImageFolder

I created my first CNN using Microsoft’s Dogs vs Cats dataset. Both times when I’ve ran it, the training stops because of a corrupt image. I read the ImageFolder docs and see that is_valid_file is a function, but I have no idea how to create a function that would be able to do that. Can someone give me code that can do that? Thanks.

Once you’ve created the Dataset, you could iterate it and use the index (and thus file path) to either fix the file (re-download, recreate) or remove it from the folder.

Also, what error message are you getting?

I have question why in torchvision 0.2.1 when I supply is_valid_file function there is error : “init() got an unexpected keyword argument ‘is_valid_file’”? I can see from doc the argument exist.

torchvision 0.2.1 was released on April 24, 2018 based on the release notes, while is_valid_file was added on April 25 2019 in this PR (so a year later).

Where did you find the docs which mention it in 0.2.1?

My bad. I have 0.2.1 in my system all this time, and I thought it already includes that missing argument. I just reinstall pytorch and torchvision, and there is no problem anymore.
Thanks for your time.

What does that mean? Can you show a sample code?

If i iterate my dataset i already get the error (caused by a corrput jpg file in my image folder) before i can do something in the for loop.

That’s the idea to debug the issue further. Once you’ve isolated the index, where the dataset fails, check the corresponding file.

A code snippet would be:

for idx in range(len(dataset)):
    try:
        batch = dataset[idx]
    except Exception as e:
        print(idx)
1 Like