OSError: cannot identify image file <_io.BufferedReader name='Goa/train/Chapora_fort/fortress-tower-chapora-fort-background-600w-1452260633.jpg'>

kabilan · January 8, 2020, 1:10am

OSError: cannot identify image file <_io.BufferedReader name=‘Goa/train/Chapora_fort/fortress-tower-chapora-fort-background-600w-1452260633.jpg’>

I build a custom dataset everything went good. While training I encountered an os error.

ptrblck · January 8, 2020, 8:05am

Could you check the printed path and try to load this image manually in another script?
Also, are you passing file paths or file pointers to Image.open?

kabilan · January 8, 2020, 12:38pm

yeah I try to open the image it showed me error, other images works. there are many such images in the dataset

ptrblck · January 8, 2020, 8:49pm

Does the size of the file look correct or might it have been corrupted during e.g. the download?
I’m not sure if there is a better workaround than to remove these files from your dataset.

midhunharikumar · January 8, 2020, 9:36pm

You could consider skipping the file in the data loader and using a custom collate function to remove the missing values.

def my_collate(batch):
    "Puts each data field into a tensor with outer dimension batch size"
    batch = list(filter(lambda x:x is not None, batch))
    return default_collate(batch)
train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=args.batch_size, shuffle=True,
        num_workers=args.num_workers, pin_memory=True, sampler=train_sampler,collate_fn=my_collate)

If you have a lot of missing images this might cause your batch size to fluctuate too much. In which case preprocessing the images offline and training with images that are in-tact will resolve your issue.

kabilan · January 9, 2020, 12:55am

I found that PIL is not opening webp images. It asked me to delete those files. Is there any way to delete those files using python code

amin_asadi · March 9, 2021, 4:58pm

Hello
Have a nice time
I have a problem similar to yours.
Did you get the result?