Corrupted images in bmp extension

i am working on a project but whenever i try to show an image i get an error that image can not be identified. i’ve looked around and i have printed all the corrupted images it turns out that all of these images are of bmp extension and they are a lot. i can’t remove them , what should i do ;(

@ptrblck

How do you try to visualize them and what’s the error?


although i have included the snippet code below, i still get the same error .
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
i tried to visualize some of the images and it’s by chance worked! but when i move to the training step same error arise

this is how i identified the corrupted images @ptrblck
from PIL import Image
import os
import numpy as np
import cv2
def check_images(directory):
issues = {‘corrupted’: , ‘low_resolution’: , ‘blurry’: }
for root, _, files in os.walk(directory):
for file in files:
try:
with Image.open(os.path.join(root, file)) as img:
# Check for low resolution
if img.width < 800 or img.height < 600:
issues[‘low_resolution’].append(file)

                # Check for blurriness
                img_np = np.array(img)
                variance_of_laplacian = cv2.Laplacian(img_np, cv2.CV_64F).var()
                if variance_of_laplacian < 100:  
                    issues['blurry'].append(file)
        except (IOError, SyntaxError) as e:
            issues['corrupted'].append(file)

return issues

issues = check_images(DATADIR)
print(issues)

If these images are indeed corrupted and PIL has trouble reading them, you might need to re-download them. Are you able to open these images with any other image viewer?

Yes, I opened them in kaggle data description section and they are totally fine!
But what do you mean by re-uploading them ?

Does the problem occur as a result of the environment RAM limitation ? Because each time I re-download the data I get different number of corrupted images.

No, I don’t think the RAM is related to this issue and it seems the download might fail creating corrupted images.

can you suggest alternative way i can upload the dataset consist of 19,890 images other than ImageFolder?