Help with DataLoader reading images?

Hello everyone,

I encountered an error when trying to define a custom dataset for the PyTorch dataloader.
I want to read in an image from the image path, generate the label from the image path, and return the image and label. Below is my code for the dataset.

 class Non_ImageFolder_Dataset(Dataset):
     def __init__(self,image_path_list = None,transform = None):
         self.samples = image_path_list
         if self.samples is None:
             self.samples = sorted(glob.glob('<path to images>'))
         self.targets = []

    for ii in self.samples:
        slide_name = ii.split('/')[-4]
        marker_name = [ii.split('/')[-2]][0]
        roi_name = [**messy stuff***]
        image_label = marker_name + '_' + slide_name + '_' + roi_name[0]
        self.targets.append(image_label) 

     def __len__(self):
         return len(self.samples)

     def __getitem__(self,idx): 
         image_path = self.samples[idx]   
         image = io.imread(image_path) 
         image_label = self.targets[idx]

         return image, image_label

In the getitem function on the io.imread(image_path), I keep getting an error. When I try to load in the image with io.imread, I am getting the following stack trace. The images don’t seem to be corrupt. The file path name seems OK. I tried both the absolute path and the path from the current directory onward. Do you see any issue based on looking at this stack trace? I don’t really know how to interpret it.

for images, labels in dataloader:
File “/home/vivek/.local/share/virtualenvs/PyTorch-rjjGfeb_/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 345, in next
data = self.next_data()
File "/home/vivek/.local/share/virtualenvs/PyTorch-rjjGfeb
/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in next_data
data = self.dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/vivek/.local/share/virtualenvs/PyTorch-rjjGfeb
/lib/python3.7/site-packages/torch/utils/data/utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/vivek/.local/share/virtualenvs/PyTorch-rjjGfeb
/lib/python3.7/site-packages/torch/utils/data/utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/mnt/vivek/Analytics/InHouseProjects/Clustering/IIC_folder/cluster/data.py”, line 83, in getitem
image = io.imread(image_path)
File "/home/vivek/.local/share/virtualenvs/PyTorch-rjjGfeb
/lib/python3.7/site-packages/skimage/io/io.py", line 48, in imread
img = call_plugin(‘imread’, fname, plugin=plugin, **plugin_args)
File "/home/vivek/.local/share/virtualenvs/PyTorch-rjjGfeb
/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 209, in call_plugin
return func(*args, **kwargs)
File "/home/vivek/.local/share/virtualenvs/PyTorch-rjjGfeb
/lib/python3.7/site-packages/skimage/io/plugins/tifffile_plugin.py", line 37, in imread
return tif.asarray(**kwargs)
File "/home/vivek/.local/share/virtualenvs/PyTorch-rjjGfeb
/lib/python3.7/site-packages/tifffile/tifffile.py", line 3005, in asarray
series = self.series[series]
IndexError: list index out of range

Thank you very much!

Since the error is raised by tiffile, it seems that the TIF image is indeed somehow “broken”.
Could you check which index is used in the dataset by iterating it and check the file manually?

You are right. Many of the files had not completing getting copied over to my data folder before I stopped data getting transferred, so several were broken. Thanks.