EOFError on the Dataloader

Luis_Costa · February 1, 2019, 6:59am

  File "inception_test.py", line 203, in <module>
    main(args)
  File "inception_test.py", line 190, in main
    num_epochs=args.epochs)
  File "inception_test.py", line 110, in train_model
    for data in dataloaders[phase]:
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
EOFError: Traceback (most recent call last):
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torchvision/datasets/folder.py", line 101, in __getitem__
    sample = self.loader(path)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torchvision/datasets/folder.py", line 147, in default_loader
    return pil_loader(path)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/torchvision/datasets/folder.py", line 130, in pil_loader
    return img.convert('RGB')
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/PIL/Image.py", line 915, in convert
    self.load()
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/PIL/ImageFile.py", line 250, in load
    self.load_end()
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/PIL/PngImagePlugin.py", line 677, in load_end
    self.png.call(cid, pos, length)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/PIL/PngImagePlugin.py", line 140, in call
    return getattr(self, "chunk_" + cid.decode('ascii'))(pos, length)
  File "/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder_preprocessing/lib/python3.6/site-packages/PIL/PngImagePlugin.py", line 356, in chunk_IDAT
    raise EOFError
EOFError

I’m getting this error when training my InceptionV3 model using PyTorch. It happens randomly.
Does this have to do with a corrupt image? Is there any way to ignore this error and proceed to the next image?

Here’s my implementation.

Any idea what’s going on?

rasbt · February 1, 2019, 7:02am

the randomness could be due to dataset shuffling. I recommend iterating through the dataset (without training, just iterating to make it fast) and print the image names and check if it always happens on the same images. Then, manually open these images to see if there’s an issue with these.

Luis_Costa · February 1, 2019, 7:07am

I think so too. Any way to get the image name from the DataLoader?

rasbt · February 1, 2019, 7:27am

Depends on your dataset/dataloader. If you have a custom dataset like this:

class CelebaDataset(Dataset):
    """Custom Dataset for loading CelebA face images"""

    def __init__(self, txt_path, img_dir, transform=None):
    
        df = pd.read_csv(txt_path, sep=" ", index_col=0)
        self.img_dir = img_dir
        self.txt_path = txt_path
        self.img_names = df.index.values
        self.y = df['Male'].values
        self.transform = transform

    def __getitem__(self, index):
        img = Image.open(os.path.join(self.img_dir,
                                      self.img_names[index]))
        
        if self.transform is not None:
            img = self.transform(img)
        
        label = self.y[index]
        return img, label

    def __len__(self):
        return self.y.shape[0]

train_dataset = CelebaDataset(txt_path='celeba_gender_attr_train.txt',
                              img_dir='img_align_celeba/',
                              transform=custom_transform)

train_loader = DataLoader(dataset=train_dataset,
                          batch_size=128,
                          shuffle=True,
                          num_workers=4)

you could print the image in the __getitem__ method (here: self.img_names[index]). But really depends on what your dataset looks like.

Luis_Costa · February 1, 2019, 7:29am

I’m using datasets from torchvision. I can’t change it I guess

rasbt · February 2, 2019, 12:20am

Hm, the datasets from torchvision shouldn’t have any errors though, I think. I tried most of them in the past without issues. Maybe the initial download of these datasets got corrupted or so? Maybe try to delete the local datasets (there should be a ./data) folder relative to the directory where your code is so that it downloads a fresh copy next time you run it.

Luis_Costa · February 2, 2019, 2:44am

I got it fixed by deleting the corrupted files. Took some time but I did it. But I got this error now. Which does not make much sense.