What is the best way to recover from errors in our dataset’s __getitem__
function and skip the problematic item(s) in the dataset without crashing? For example, the image could not be loaded due to some problem (eg, loading from a network disk).
class MyDataset(Dataset):
def __init__(self, image_dir, image_list_file, transform=None):
self.transform = transform
self.image_dir = image_dir
self.image_list = pickle_load(image_list_file)
def __len__(self):
return len(self.image_list)
def __getitem__(self, index):
image_file, label = self.image_list[index]
image_path = os.path.join(self.image_dir, image_file)
image = Image.open(image_path).convert('RGB')
if self.transform:
image = self.transform(image)
return image, label
for images, labels in dataloader:
# process the batch: if an error in __getitem__ the program crashes here
I am thinking of wrapping the for loop (for images, labels in dataloader
) in a try-catch
block. Is there a better way you know of?