PIL.UnidentifiedImageError in Dataloader

def load_data():
    ImageFile.LOAD_TRUNCATED_IMAGES = True

    device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
    print("Running on {}".format(device))
    data_transform = transforms.Compose([
            transforms.RandomResizedCrop(crop),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize(mean=mean,
                                std=std)
        ])
    dataset = datasets.ImageFolder(root=current_dir,
                                   transform=data_transform
                                   )

    print("Size of dataset is ",len(dataset))

    Ntrain = int(len(dataset) * cutoff)
    dataset = shuffle(dataset)
    train_data, test_data = dataset[:Ntrain], dataset[Ntrain:]

    train_loader = torch.utils.data.DataLoader(train_data,
                                                batch_size=batch_size,
                                                shuffle=True,
                                                num_workers=workers
                                                ).to(device)

    test_loader  = torch.utils.data.DataLoader(test_data,
                                                batch_size=batch_size,
                                                shuffle=True,
                                                num_workers=workers
                                                ).to(device)
    return train_loader, test_loader

after I was loading data into DataLoader it was generated the following error. In my dataset I had currupted image named 666.jpg. How to fix this issue.Thank you.

error msg:
“cannot identify image file %r” % (filename if filename else fp)
PIL.UnidentifiedImageError: cannot identify image file <_io.BufferedReader name=‘E:\My Projects\Cat-vs-Dog-Image-Classifier-Differnet-Models-with-Pytroch\Data\Cat\666.jpg’>

1 Like

Could you check, if the file is corrupt, i.e. can you open it with another viewer?
If that’s the case, you would either have to download it again or remove it.

1 Like

Thank you. I was manually checked the dataset and I just found only 2 courrupted images. after removing them works fine

is there a way to ignore corrupted images (in dataset or in the dataloader), instead of manually removing them?

Hi, yeah this can be done using is_valid_file argument while using ImageFolder as well others datasets loaders inheriting DatasetFolder.

The documentation states that:

is_valid_file (optional): A function that takes path of a file
and checks if the file is a valid file
(used to check of corrupt files) both extensions and
is_valid_file should not be passed. Defaults to None.

For more information, look into this