Drop bad image using torch.utils.data.DataLoader from dir load images?

I have a bunch of images but these image has invalid ones, I using PIL cleand but still load images have errors.

UserWarning: Corrupt EXIF data.  Expecting to read 4 bytes but only got 0

Anybody knows how to solve this things?

u can do it in ImageFolder, https://github.com/pytorch/vision/blob/master/torchvision/datasets/folder.py, u can modify it to do your own task.

Thanks for your reply, BTW, I did a clean process and seems all images are clean. But, a question composed, I have a HUGE dataset which contains almost 28 GB images in 20 classes, these images are very big, not just MNIST or CIFAR10, I want do a classification on this dataset.

But it seems pytorch dataloader will load all images at once, my program are still loading data, this is very unefficient. Did pytorch have anything like tensorflow records and implement a generator or something so that shouldn’t load all images at once?

dataloader is a iterator, it won’t load all images at once

2 Likes

Check out nonechucks - a library I wrote for PyTorch that allows you to do exactly that (and more)! @jinfagang @SherlockLiao