OSError: image file is truncated (28 bytes not processed) during learning

I came here stuck on exactly this same issue.

Initially, setting:
ImageFile.LOAD_TRUNCATED_IMAGES=True
solved the problem. Although in that initial case, I was using num_workers=0.

In my case, it was reproducible that defining the loaders with num_workers > 0 would end up throwing the OSError exception some time during training.

As I understand it, num_workers=0 implies that processing is done in the same execution context as the training, whereas > 0 spawns other processes.

So my guess is that the spawned processes do not have ImageFile.LOAD_TRUNCATED_IMAGES=True set in them, so they will fail when trying to import a corrupted image.

If that suspicion is correct, is there any way to perpetuate that setting to the spawned workers?

Possible confounding factors for my case:

  • this is on Windows, as my only machine with a GPU is Windows (VR rig in the office :sweat_smile:)
  • I am running a pre-release build of Pillow (6.1.0.dev0), due to encountering this issue with my dataset:
    https://github.com/python-pillow/Pillow/issues/3769

Having multiple workers was important for my application because it seems that ~75% of the total training time is spent doing something other than just calculation, even with num_workers=10.

My manual fix was to use this code to go through my datasets to find the image that was causing problems:

import tqdm

for DUT in [train_dataset, valid_dataset, test_dataset]:
    for fn,label in tqdm.tqdm(DUT.imgs):
        try:
            im = ImageFile.Image.open(fn)
            im2 = im.convert('RGB')
        except OSError:
            print("Cannot load : {}".format(fn))

That did find one image that was unloadable, for my case.
(for any of the other Udacity Deep Learning Nanodegree folks who might find this via search, the file dogImages/train\098.Leonberger\Leonberger_06571.jpg was the unloadable file)

I trivially re-saved the file, which appears to have filled in any corrupted data, and the many-workers loader approach now works.

1 Like

thanks a lot for this post, I had exactly the same issue on my windows 10. I ended up simply removing the file mentioned from the dataset while keeping my num_workers >0 which resolved the issue!

1 Like