Data augmentation folr labels and images?

    def __init__(self,root_dir,seg_dir,transforms = None):
        self.root_dir = 'training path'
        self.seg_dir = 'label'
        self.transforms = transforms
        self.files = os.listdir(self.root_dir)
        self.lables = os.listdir(self.seg_dir)
    def __len__(self):
        return len(self.files)
    def __getitem__(self,idx):
        img_name = self.files[idx]
        label_name = self.lables[idx]
        img =,img_name))
        label =,label_name))
        if self.transforms:
            img = self.transforms(img)
            label = self.transforms(label)
            return img,label
            return img, label
full_dataset = Mydata('/training ',
train_size = int(0.8 * len(full_dataset))
val_size = len(full_dataset) - train_size
train_dataset, val_dataset =, [train_size, val_size])

in the code above i am trying to do data augmentation / affline
i do not know if they are similer or not , ,
how i can do it ? is suppost to be done after dividing to val and training or befor ?

What do you mean by “they are similar”?
The transformations will be applied on both datasets after splitting.

by similar i mean data augmentation and affline transformation .

You can use affine transformations like rotation as data augmentation.
I’m still not clear, if I misunderstand the question, but since you passed the transformations to your Dataset, they will be used for each sample (also after splitting the Dataset).

sorry my question was not clear , what i mean after dividing fulldataset to val and training

train_size = int(0.8 * len(full_dataset))
val_size = len(full_dataset) - train_size
train_dataset, val_dataset =, [train_size, val_size]

`data augmentation supposed to happen after this point, or  after 

train_loader = data.DataLoader(train_dataset,shuffle=False,batch_size=bs)
val_loader = data.DataLoader(val_dataset,shuffle=False,batch_size=bs)

Thanks for clarifying the question, as I’ve indeed misunderstood it.
The data augmentation (transformation) will be applied lazily, i.e. while each sample if being loaded.
E.g. if you get the sample at index 0 using x, y = train_dataset[0], the transformations will be applied live at this line of code while executing __getitem__.
The same applies for drawing batches from your DataLoader. While creating the batch, each sample will be drawn from the Dataset by calling its __getitem__, such that the transformation will be applied live again.
Does this answer your question?

1 Like