'Corrupt EXIF data' messages when training ImageNet

Dear all,

I have started some experiments using the imagenet example in the pytorch examples distribution (branch 0.3.1). I downloaded and processed the data as instructed on
https://github.com/soumith/imagenet-multiGPU.torch (unpacking the many folders inside ILSVRC2012_img_train.tar and running valprep.sh)

When training the model (e.g. AlexNet), a few times per epoch I will see warnings like:

(…) /pylocal/lib/python2.7/site-packages/PIL/TiffImagePlugin.py:756: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))

(…) /pylocal/lib/python2.7/site-packages/PIL/TiffImagePlugin.py:739: UserWarning: Possibly corrupt EXIF data. Expecting to read 2555904 bytes but only got 0. Skipping tag 0
" Skipping tag %s" % (size, len(data), tag))

Is this to be expected (i.e. some of the imagenet files just have bad EXIF data, and this shouldn’t interfere with training) ? Or does it suggest that my dataset is corrupt?

Thanks in advance.

I am also encountering the same issue. @ttb which CNN model are you training ? I am trying to train Widereset 50-2 Model

@nbansal90 I was training the AlexNet model. Let me know if you find anything!

@ttb I have encountered the same problem with ImageNet data. Although I am training the model in tensorflow. But this seems to be a problem with data, maybe. And I would also like to know that will it interefere with training? Because I am experience the training sometimes gets stuck and I don’t see any progress.

I’m having this issue as well with pytorch 1.0 did you find out if these warnings affected the training eventually? Thank you!

Hi,

I’m not sure if it will ever affect the training.

I was able to get rid of the warnings by using a script to remove all exif data with the piexif library.

The script below didn’t work at first because one of the JPEG’s is actually a PNG. You can fix that with ImageMagick:

mv n02105855/n02105855_2933.JPEG n02105855/n02105855_2933.PNG
convert n02105855/n02105855_2933.PNG n02105855/n02105855_2933.JPEG
rm n02105855/n02105855_2933.PNG

Then for good measure I also converted the JPEGs that were CMYK to RGB, although I’m not sure if this makes a difference:

convert -negate -colorspace RGB n01739381/n01739381_1309.JPEG n01739381/n01739381_1309.JPEG
convert -negate -colorspace RGB n02077923/n02077923_14822.JPEG n02077923/n02077923_14822.JPEG
convert -negate -colorspace RGB n02447366/n02447366_23489.JPEG n02447366/n02447366_23489.JPEG
convert -negate -colorspace RGB n02492035/n02492035_15739.JPEG n02492035/n02492035_15739.JPEG
convert -negate -colorspace RGB n02747177/n02747177_10752.JPEG n02747177/n02747177_10752.JPEG
convert -negate -colorspace RGB n03018349/n03018349_4028.JPEG n03018349/n03018349_4028.JPEG
convert -negate -colorspace RGB n03062245/n03062245_4620.JPEG n03062245/n03062245_4620.JPEG
convert -negate -colorspace RGB n03347037/n03347037_9675.JPEG n03347037/n03347037_9675.JPEG
convert -negate -colorspace RGB n03467068/n03467068_12171.JPEG n03467068/n03467068_12171.JPEG
convert -negate -colorspace RGB n03529860/n03529860_11437.JPEG n03529860/n03529860_11437.JPEG
convert -negate -colorspace RGB n03544143/n03544143_17228.JPEG n03544143/n03544143_17228.JPEG
convert -negate -colorspace RGB n03633091/n03633091_5218.JPEG n03633091/n03633091_5218.JPEG
convert -negate -colorspace RGB n03710637/n03710637_5125.JPEG n03710637/n03710637_5125.JPEG
convert -negate -colorspace RGB n03961711/n03961711_5286.JPEG n03961711/n03961711_5286.JPEG
convert -negate -colorspace RGB n04033995/n04033995_2932.JPEG n04033995/n04033995_2932.JPEG
convert -negate -colorspace RGB n04258138/n04258138_17003.JPEG n04258138/n04258138_17003.JPEG
convert -negate -colorspace RGB n04264628/n04264628_27969.JPEG n04264628/n04264628_27969.JPEG
convert -negate -colorspace RGB n04336792/n04336792_7448.JPEG n04336792/n04336792_7448.JPEG
convert -negate -colorspace RGB n04371774/n04371774_5854.JPEG n04371774/n04371774_5854.JPEG
convert -negate -colorspace RGB n04596742/n04596742_4225.JPEG n04596742/n04596742_4225.JPEG
convert -negate -colorspace RGB n07583066/n07583066_647.JPEG n07583066/n07583066_647.JPEG
convert -negate -colorspace RGB n13037406/n13037406_4650.JPEG n13037406/n13037406_4650.JPEG

Then you can remove th EXIF data:

import glob
import piexif

nfiles = 0
for filename in glob.iglob('~/ImageNet/**/*.JPEG', recursive=True):
    nfiles = nfiles + 1
    print("About to process file %d, which is %s." % (nfiles,filename))
    piexif.remove(filename)

1 Like