Hello Everyone
I hope you are doing awesome, I am stuck on a big problem, I read lots of blogs about it but there isn’t a real solution.
The problem is when I loop through my data loader (I am using Chexpert dataset) I find NoneType objects instead of images.
The structure of the dataset is
The root directory is CheXpert-v1.0-small, inside it, there are two folders contain images and two CSV files contain the path to the image and the label.
so I created a custom dataset:
class CXRDataset(Dataset):
'''
add some explication about this calss
'''
def __init__(self, root_dir = None, csv_file = None, transform=None):
if root_dir is not None and csv_file is not None:
self.root_dir = root_dir
self.csv_file = csv_file
self.cursor = 0
self.annotations = pd.read_csv(os.path.join(self.root_dir, self.csv_file))
self.transform = transform
def __len__(self):
return len(self.annotations)
def __repr__(self):
return "Test root :% s csv :% s id :% s" % (self.root_dir, self.csv_file, self.cursor)
def __getitem__(self, index):
self.cursor = index
image_path = os.path.join(self.root_dir, self.annotations.iloc[index, 1])
image = cv2.imread(image_path)
y_label = torch.tensor(list(self.annotations.iloc[index, 2:]))
if self.transform:
# the transform should contain the initial pre-processing
image = self.transform(image)
return image, y_label
and when I load the data:
train_dataset = CXRDataset(root_source, csv_train_source, transform)
valid_dataset = CXRDataset(root_source, csv_valid_source, transform)
then the data loader:
train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True)
valid_loader = DataLoader(dataset=valid_dataset, batch_size=16, shuffle=True)
and now I want to check that my train_loader
and valid_loader
that don’t have any NoneType
objects.
I used this code, it does not work:
for dataset in [train_loader, valid_loader]:
for batch in dataset:
img_batch ,label_batch = batch
for idx in range(len(label_batch)):
image = img_batch[idx].squeeze()
label = label_batch[idx]
if image is None or not any(label) :
print([dataset])
So please any advice to check my data before feed it to the NN?
Thanks in advance