When I go through my dataloader, I find NoneType objects!

Hello Everyone
I hope you are doing awesome, I am stuck on a big problem, I read lots of blogs about it but there isn’t a real solution.
The problem is when I loop through my data loader (I am using Chexpert dataset) I find NoneType objects instead of images.
The structure of the dataset is
image

The root directory is CheXpert-v1.0-small, inside it, there are two folders contain images and two CSV files contain the path to the image and the label.

so I created a custom dataset:

class CXRDataset(Dataset):
  '''
  add some explication about this calss
  '''
  def __init__(self, root_dir = None, csv_file = None, transform=None):
    if root_dir is not None and csv_file is not None:
      self.root_dir = root_dir
      self.csv_file = csv_file
      self.cursor = 0
      self.annotations = pd.read_csv(os.path.join(self.root_dir, self.csv_file))
      self.transform = transform

  def __len__(self):
    return len(self.annotations)

  def __repr__(self): 
    return "Test root :% s csv :% s id :% s" % (self.root_dir, self.csv_file, self.cursor) 

  def __getitem__(self, index):
    self.cursor = index
    image_path = os.path.join(self.root_dir, self.annotations.iloc[index, 1])
    image = cv2.imread(image_path)

    y_label = torch.tensor(list(self.annotations.iloc[index, 2:]))
    
    if self.transform:
      # the transform should contain the initial pre-processing 
      image = self.transform(image)
    
    return image, y_label

and when I load the data:

train_dataset = CXRDataset(root_source, csv_train_source, transform)
valid_dataset = CXRDataset(root_source, csv_valid_source, transform)

then the data loader:

train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True)
valid_loader = DataLoader(dataset=valid_dataset, batch_size=16, shuffle=True)

and now I want to check that my train_loader and valid_loader that don’t have any NoneType objects.

I used this code, it does not work:

for dataset in [train_loader, valid_loader]:
  for batch in dataset:
    img_batch ,label_batch = batch
    for idx in range(len(label_batch)):
      image = img_batch[idx].squeeze()
      label = label_batch[idx]
      if image is None or not any(label) :
        print([dataset])

So please any advice to check my data before feed it to the NN?

Thanks in advance

I’m not sure what exactly “it does not work” mean regarding the last code snippet, i.e. if the check is not working or if you are seeing None samples and are unsure how to fix it.
In any case, I would try to narrow down why None objects are returned and try to fix this afterwards.
To do so, use a batch_size of 1 and disable shuffling, iterate the DataLoader, and check which index returns the None objects. Once found, index the dataset using this index and check the __getitem__ further to isolate what’s failing.

1 Like

I got it, Thank you very much