Possible to skip bad items in data loader?


#1

I have some rows in my data that are bad . Is it possible to skip or return None for bad data? I’ve tried returning None, but it dies in the pipeline. Is it possible to do this kind of functionality without modify the core pytorch libraries?

Traceback (most recent call last):
File “train.py”, line 125, in
main(args)
File “train.py”, line 53, in main
for i, (images, captions, lengths) in enumerate(data_loader):
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 212, in next
return self._process_next_batch(batch)
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 239, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
OSError: Traceback (most recent call last):
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 41, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 41, in
samples = collate_fn([dataset[i] for i in batch_indices])
File “/home/jtoy/sandbox/sketchnet/pytorch/data_loader.py”, line 44, in getitem
image = Image.open(os.path.join(path, “image.jpg”)).convert(‘RGB’)
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/PIL/Image.py”, line 844, in convert
self.load()
File “/home/jtoy/anaconda3/lib/python3.6/site-packages/PIL/ImageFile.py”, line 226, in load
"(%d bytes not processed)" % len(b))


#2

this thread might help you Questions about Dataloader and Dataset


#3

Exactly what I needed to fix the issue, thank you!


(Amogh Mannekote) #4

@deepcode @smth Check out nonechucks - a library I wrote for PyTorch that allows you to do exactly that (and more)!