PIL.UnidentifiedImageError in Dataloader

Izu97 · March 18, 2020, 2:59am

def load_data():
    ImageFile.LOAD_TRUNCATED_IMAGES = True

    device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
    print("Running on {}".format(device))
    data_transform = transforms.Compose([
            transforms.RandomResizedCrop(crop),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize(mean=mean,
                                std=std)
        ])
    dataset = datasets.ImageFolder(root=current_dir,
                                   transform=data_transform
                                   )

    print("Size of dataset is ",len(dataset))

    Ntrain = int(len(dataset) * cutoff)
    dataset = shuffle(dataset)
    train_data, test_data = dataset[:Ntrain], dataset[Ntrain:]

    train_loader = torch.utils.data.DataLoader(train_data,
                                                batch_size=batch_size,
                                                shuffle=True,
                                                num_workers=workers
                                                ).to(device)

    test_loader  = torch.utils.data.DataLoader(test_data,
                                                batch_size=batch_size,
                                                shuffle=True,
                                                num_workers=workers
                                                ).to(device)
    return train_loader, test_loader

after I was loading data into DataLoader it was generated the following error. In my dataset I had currupted image named 666.jpg. How to fix this issue.Thank you.

error msg:
“cannot identify image file %r” % (filename if filename else fp)
PIL.UnidentifiedImageError: cannot identify image file <_io.BufferedReader name=‘E:\My Projects\Cat-vs-Dog-Image-Classifier-Differnet-Models-with-Pytroch\Data\Cat\666.jpg’>

ptrblck · March 18, 2020, 4:50am

Could you check, if the file is corrupt, i.e. can you open it with another viewer?
If that’s the case, you would either have to download it again or remove it.

Izu97 · March 18, 2020, 8:44am

Thank you. I was manually checked the dataset and I just found only 2 courrupted images. after removing them works fine

Wenuka_Gunarathna · October 13, 2022, 4:52pm

is there a way to ignore corrupted images (in dataset or in the dataloader), instead of manually removing them?

ambujpawar · October 14, 2022, 12:20pm

Hi, yeah this can be done using is_valid_file argument while using ImageFolder as well others datasets loaders inheriting DatasetFolder.

The documentation states that:

is_valid_file (optional): A function that takes path of a file
and checks if the file is a valid file
(used to check of corrupt files) both extensions and
is_valid_file should not be passed. Defaults to None.

For more information, look into this