When I go through my dataloader, I find NoneType objects!

Aymen_Sekhri · September 10, 2021, 1:39pm

Hello Everyone
I hope you are doing awesome, I am stuck on a big problem, I read lots of blogs about it but there isn’t a real solution.
The problem is when I loop through my data loader (I am using Chexpert dataset) I find NoneType objects instead of images.
The structure of the dataset is

The root directory is CheXpert-v1.0-small, inside it, there are two folders contain images and two CSV files contain the path to the image and the label.

so I created a custom dataset:

class CXRDataset(Dataset):
  '''
  add some explication about this calss
  '''
  def __init__(self, root_dir = None, csv_file = None, transform=None):
    if root_dir is not None and csv_file is not None:
      self.root_dir = root_dir
      self.csv_file = csv_file
      self.cursor = 0
      self.annotations = pd.read_csv(os.path.join(self.root_dir, self.csv_file))
      self.transform = transform

  def __len__(self):
    return len(self.annotations)

  def __repr__(self): 
    return "Test root :% s csv :% s id :% s" % (self.root_dir, self.csv_file, self.cursor) 

  def __getitem__(self, index):
    self.cursor = index
    image_path = os.path.join(self.root_dir, self.annotations.iloc[index, 1])
    image = cv2.imread(image_path)

    y_label = torch.tensor(list(self.annotations.iloc[index, 2:]))
    
    if self.transform:
      # the transform should contain the initial pre-processing 
      image = self.transform(image)
    
    return image, y_label

and when I load the data:

train_dataset = CXRDataset(root_source, csv_train_source, transform)
valid_dataset = CXRDataset(root_source, csv_valid_source, transform)

then the data loader:

train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True)
valid_loader = DataLoader(dataset=valid_dataset, batch_size=16, shuffle=True)

and now I want to check that my train_loader and valid_loader that don’t have any NoneType objects.

I used this code, it does not work:

for dataset in [train_loader, valid_loader]:
  for batch in dataset:
    img_batch ,label_batch = batch
    for idx in range(len(label_batch)):
      image = img_batch[idx].squeeze()
      label = label_batch[idx]
      if image is None or not any(label) :
        print([dataset])

So please any advice to check my data before feed it to the NN?

Thanks in advance

ptrblck · September 11, 2021, 5:58am

I’m not sure what exactly “it does not work” mean regarding the last code snippet, i.e. if the check is not working or if you are seeing None samples and are unsure how to fix it.
In any case, I would try to narrow down why None objects are returned and try to fix this afterwards.
To do so, use a batch_size of 1 and disable shuffling, iterate the DataLoader, and check which index returns the None objects. Once found, index the dataset using this index and check the __getitem__ further to isolate what’s failing.

Aymen_Sekhri · September 12, 2021, 10:56am

I got it, Thank you very much

Nisarg_Doshi · April 9, 2023, 7:59pm

Hey, stuck at the same problem doing similar classification task. How did you solve it @Aymen_Sekhri ?
Please help me.

y-vectorfield · March 7, 2024, 5:52am

Hello, I met the same problem.
I did not solve this.

y-vectorfield · March 7, 2024, 9:26am

I would like to return the following dict type data from DataLoader.

    def __getitem__(self, index):
        tmp_data = self.benchmark[index]

        return {"id": tmp_data["id"],
                "question": tmp_data["question"],
                "options": tmp_data["options"],
                "answer": tmp_data["answer"],
                "image": tmp_data["image"],
                "question_type": tmp_data["question_type"],
                "index2ans": tmp_data["index2ans"],
                "correct_choice": tmp_data["correct_choice"],
                "all_choices": tmp_data["all_choices"],
                "empty_prompt": tmp_data["empty_prompt"],
                "final_input_prompt": tmp_data["final_input_prompt"],
                "gt_content": tmp_data["gt_content"]}

However, If I implement the loader, this returns None type data.
When I tried to print the return values of getitem, correct values were outputted…

ptrblck · March 7, 2024, 1:42pm

Could you post a minimal and executable code snippet reproducing the issue using random data?

y-vectorfield · March 8, 2024, 2:52am

@ptrblck , thank you very much for your message.
Acturally, I solved this problem.
I tried a simple type of dict data, and the model was moved!
The cause of this issue was the data collator func.
I forgot to implement the return method into __call__.