Recover from errors in data loader: What is the best way?

blackberry · October 17, 2019, 6:51pm

What is the best way to recover from errors in our dataset’s __getitem__ function and skip the problematic item(s) in the dataset without crashing? For example, the image could not be loaded due to some problem (eg, loading from a network disk).

class MyDataset(Dataset):
  def __init__(self, image_dir, image_list_file, transform=None):

    self.transform = transform
    self.image_dir = image_dir
    self.image_list = pickle_load(image_list_file)

  def __len__(self):
    return len(self.image_list)

  def __getitem__(self, index):
    image_file, label = self.image_list[index]
    image_path = os.path.join(self.image_dir, image_file)
    image = Image.open(image_path).convert('RGB')
    if self.transform:
         image = self.transform(image)
    return image, label

for images, labels in dataloader:
      # process the batch: if an error in __getitem__ the program crashes here

I am thinking of wrapping the for loop (for images, labels in dataloader) in a try-catch block. Is there a better way you know of?

JuanFMontesinos · October 17, 2019, 6:53pm

Do a “fake” getitem and then catch it in the real getitem generating a random idx if fails