Possible to skip bad items in data loader?

I have some rows in my data that are bad . Is it possible to skip or return None for bad data? I’ve tried returning None, but it dies in the pipeline. Is it possible to do this kind of functionality without modify the core pytorch libraries?

Traceback (most recent call last):
File “train.py”, line 125, in
main(args)
File “train.py”, line 53, in main
for i, (images, captions, lengths) in enumerate(data_loader):
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 212, in next
return self._process_next_batch(batch)
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 239, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
OSError: Traceback (most recent call last):
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 41, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 41, in
samples = collate_fn([dataset[i] for i in batch_indices])
File “/home/jtoy/sandbox/sketchnet/pytorch/data_loader.py”, line 44, in getitem
image = Image.open(os.path.join(path, “image.jpg”)).convert(‘RGB’)
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/PIL/Image.py”, line 844, in convert
self.load()
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/PIL/ImageFile.py”, line 226, in load
"(%d bytes not processed)" % len(b))

6 Likes

this thread might help you Questions about Dataloader and Dataset

10 Likes

Exactly what I needed to fix the issue, thank you!

@deepcode @smth Check out nonechucks - a library I wrote for PyTorch that allows you to do exactly that (and more)!

2 Likes

Thanks, was totally searching for a simple way to filter without getting deeper into the Dataloaders.

@msamogh

nonechucks doesn’t allow iteration over very large datasets though due to the memory leak issue. Would be great if this was resolved!

1 Like