Creating training loops for data augmentation

amy2 · November 6, 2023, 9:33am

I want to apply data augmentation to my dataset " faceLandmarks dataset " in a way that I run multiple training loops. One based on dataset without transform, another one based on dataset with transform and after that for every sample I get extra images. this is what I tried

class FaceLandmarksDataset(Dataset):

   def __init__(self, csv_file, root_dir, transform=None):
             self.landmarks_frame = pd.read_csv(csv_file)
             self.root_dir = root_dir
            self.transform = transform
   def __len__(self):
      return len(self.landmarks_framee

   def __getitem__(self, idx):
      if torch.is_tensor(idx):
      idx = idx.tolist()
    
      self.transform = transforms.Compose([Rescale,
        RandomCrop,
        ToTensor()])
     img_name = os.path.join(self.root_dir,
                        self.landmarks_frame.iloc[idx, 0])

     image = io.imread(img_name)
    landmarks = self.landmarks_frame.iloc[idx, 1:]
    landmarks = np.array([landmarks], dtype=float).reshape(-1, 2)
    #start adding th loop to have tha transformed version 
        # Assign a label indicating 'none' augmentation by default
    aug= 'none'

        # Apply transformation if it exists and is a list with augmentation labels
   if self.transform:
      if isinstance(self.transform, list):
         aug = self.transform[idx % len(self.transform)]

   sample = {'image': image, 'landmarks': landmarks, 'aug': aug}

   if self.transform and aug != 'none':
      sample = self.transform(sample)
      sample = {'image': image, 'landmarks': landmarks}

return sample

 # dataloader for the dataset ---------------------------------
transformed_dataset =
FaceLandmarksDataset(csv_file='/faces_dataset/faces/faces/face_la 
              ndmarks.csv',root_dir='C/faces_dataset/faces/faces/')
                                  
 dataloader = DataLoader(transformed_dataset, batch_size=4,
                shuffle=True, num_workers=0)

then I got confused how can I call both the transformed and original sample in the dataloader

ptrblck · November 6, 2023, 3:19pm

You could create separate Dataset instances with and without a transformation being passed to their __init__ method. To do so fix the recreation of the transformation in the __getitem__:

      self.transform = transforms.Compose([Rescale,
        RandomCrop,
        ToTensor()])

and use the self.transform directly.

amy2 · November 6, 2023, 4:09pm

so if I undertand what your saying right I should make

class FaceLandmarksDataset(Dataset):

and another one

class AugmentedFaceLandmarksDataset(Dataset):

with the corresponding _init_ _getitem_ and _len_ functions
and concatenate both for the dataloader @ptrblck

ptrblck · November 6, 2023, 4:55pm

No, you should fix the self.transform usage in your current FaceLandmarksDataset and just reuse the attribute initialized in the __init__ and create two datasets:

raw_dataset = FaceLandmarkDataset(csv_file, root_dir, transform=transforms.ToTensor())
transformed_dataset = FaceLandmarkDataset(csv_file, root_dir, transform=transforms.Compose([transforms.Rescale(...), transforms.RandomCrop(...), transforms.ToTensor()]))