'NoneType' object has no attribute 'numel'

I’ve created a custom dataset that allows me to read in images from a single folder and add the labels separately:

class SingleFolderImage(Dataset):
  def __init__(self, base_path, label_dict, transform=None):
    self.base_path = base_path
    self.image_paths = [str(f) for f in Path(self.base_path).glob('*')]
    self.transform = transform
    self.label_dict = label_dict
        
  def __getitem__(self, index):
    image_path = self.image_paths[index]
    image_name = re.split(r'/|\.', image_path)[-2]
    x = cv2.imread(image_path, -1)
    ratio = np.amax(x) / 256
    x = (x / ratio).astype('uint8')
    if self.transform is not None:
        x = self.transform(x)
    if image_name in list(self.label_dict.keys()):
      y = self.label_dict[image_name]
    else:
      y = {'boxes': None, 'labels': None}
    return x, y
  
  def __len__(self):
    return len(self.image_paths)

I call it as following:

train_dataset = SingleFolderImage(image_path, parallel_labels, transform = data_transforms)
train_dataloader = DataLoader(full_train_dataset, batch_size=100, shuffle = True, num_workers=2)

However, when I try to read some examples I get the error in the title, without any useful way to isolate where the issue is. This didn’t happen before I added the if statement that handles images that aren’t in label_dict (but of course that leads to other issues), but I really don’t see why that would affect anything, after all I’m returning a y with the same type in both cases.

Any help would be most welcome…

Your DataLoader expects to receive valid tensors to be able to create a batch of all received samples. If some of them contain None objects, the error will be raised as seen here:

class MyDataset(Dataset):
    def __init__(self):
        self.data = torch.arange(10)
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        if index % 2 == 0:
            return self.data[index]
        else:
            return None
        
dataset = MyDataset()
for data in dataset:
    print(data)
    
# tensor(0)
# None
# tensor(2)
# None
# tensor(4)
# None
# tensor(6)
# None
# tensor(8)
# None

loader = DataLoader(dataset, batch_size=1)
for data in loader:
    print(data)
# tensor([0])
# TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'NoneType'>


loader = DataLoader(dataset, batch_size=10)
for data in loader:
    print(data)
# TypeError: expected Tensor as element 1 in argument 0, but got NoneType

You might thus want to replace these None objects with a tensor containing an “invalid” or default value instead.