Hello,
I am working on an image segmentation task. There are two folders, “inputs” and “masks”, which both contain the corresponding pairs of the dataset (i.e., input/ipt_xy.npy corresponds to masks/msk_xy.npy).
My dataset class looks like this:
class DatasetBlaBla(Dataset):
def __init__(self, root_path, ipt, tgt, transform=None):
super(DatasetBlaBla, self).__init__()
self.root_path = root_path
self.ipt = ipt
self.tgt = tgt
self.transform = transform
def __len__(self):
l1 = os.listdir(os.path.join(self.root_path, self.ipt))
l2 = os.listdir(os.path.join(self.root_path, self.tgt))
number_files_inp = len(l1)
number_files_tgt = len(l2)
if number_files_inp == number_files_tgt:
return number_files_inp
def __getitem__(self, idx):
img_path_input_patch = os.path.join(self.root_path, self.ipt, f"ipt_{idx}.npy")
img_path_tgt_patch = os.path.join(self.root_path, self.tgt, f"tgt_{idx}.npy")
input_patch = np.load(img_path_input_patch)
tgt_patch = np.load(img_path_tgt_patch)
if self.transform:
input_patch = self.transform(input_patch)
tgt_patch = self.transform(tgt_patch)
return input_patch, tgt_patch
When creating the dataset, one instance is created, which I then split into train/val/test using:
train_set, val_set, test_set = torch.utils.data.random_split(dataset, [train_size, val_size, test_size])
Finally, we come to the question:
What are best practices, in this case, to apply transformations on the train_set only?
I have looked through the forum and found a variety of approaches; however, I still wondered whether there is a distinct best practice for my use case. Also, I do not want to split the folders into train/val/test subfolders because the reshuffling will be an issue with my amount of data.
Thank you for your help,
Cheers