OSError: cannot identify image file <_io.BufferedReader name='Goa/train/Chapora_fort/fortress-tower-chapora-fort-background-600w-1452260633.jpg'>

OSError: cannot identify image file <_io.BufferedReader name=‘Goa/train/Chapora_fort/fortress-tower-chapora-fort-background-600w-1452260633.jpg’>

I build a custom dataset everything went good. While training I encountered an os error.

Could you check the printed path and try to load this image manually in another script?
Also, are you passing file paths or file pointers to Image.open?

1 Like

yeah I try to open the image it showed me error, other images works. there are many such images in the dataset

Does the size of the file look correct or might it have been corrupted during e.g. the download?
I’m not sure if there is a better workaround than to remove these files from your dataset.

1 Like

You could consider skipping the file in the data loader and using a custom collate function to remove the missing values.

def my_collate(batch):
    "Puts each data field into a tensor with outer dimension batch size"
    batch = list(filter(lambda x:x is not None, batch))
    return default_collate(batch)
train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=args.batch_size, shuffle=True,
        num_workers=args.num_workers, pin_memory=True, sampler=train_sampler,collate_fn=my_collate)

If you have a lot of missing images this might cause your batch size to fluctuate too much. In which case preprocessing the images offline and training with images that are in-tact will resolve your issue.

I found that PIL is not opening webp images. It asked me to delete those files. Is there any way to delete those files using python code

1 Like

Hello
Have a nice time
I have a problem similar to yours.
Did you get the result?