I am writing a simple custom DataLoader (which I will add more features to later) for a segmentation dataset but the (image, mask) pair I return using __getitem()__
method are different; the returned mask belongs to a different image than the one which is returned. My directory structure is /home/bohare/data/images
and /home/bohare/data/masks
.
Following is the code I have:
import torch
from torch.utils.data.dataset import Dataset
from PIL import Image
import glob
import os
import matplotlib.pyplot as plt
class CustomDataset(Dataset):
def __init__(self, folder_path):
self.img_files = glob.glob(os.path.join(folder_path,'images','*.png'))
self.mask_files = glob.glob(os.path.join(folder_path,'masks','*.png'))
def __getitem__(self, index):
image = Image.open(self.img_files[index])
mask = Image.open(self.mask_files[index])
return image, mask
def __len__(self):
return len(self.img_files)
data = CustomDataset(folder_path = '/home/bohare/data')
len(data)
This code correctly gives out the total size of the dataset.
But when I use:
img, msk = data.__getitem__(n)
where n is the index of any (image, mask) pair and I plot the image and mask, they do not correspond to one another.
How can I modify/what can I add to the code to make sure the (image, mask) pair are returned correctly? Thanks for the help.