Dear PyTorch users,
I wrote a script that loops over the imagenet dataset and computes the top1 and top5 accuracies on both the train and validation sets. For validation the loop works perfectly. However for train I get the following error:
OSError: cannot identify image file <_io.BufferedReader name=‘/path_to_imagenet/train/n04266014/n04266014_10835.JPEG’>
After some investigation I found that the image which can not be loaded is in fact of size 0 bytes, meaning it’s corrupted. Not only that but also all the images in that class are of size 0 bytes. At that point I thought that perhaps the issue is with the downloaded imagenet dataset so I checked the md5sum. However, it was equal to the one posted on the imagenet website.
At this point I am wondering:
- Does the imagenet dataset contain images that are of size 0 bytes (to the point where a whole class contains corrupted images)?
- Is the ImageFolder class of pytorch supposed to be able to handle those 0 byte images?
- What other explanation could there be for why I have 0 byte images except the downloaded tar file being broken?
- Could this be related to confusion between PIL and Pillow in my conda setup?
Thanks in advance,
V