Data augmentation before training

ptrblck · August 13, 2020, 7:42am

Thanks for the information.
If I understand the use case correctly, you would have 884*3000 images in each epoch, where each of the original 884 images will be randomly transformed 3000 times.
In that case, my previous proposal should work and this code snippet would show, what I meant:

class MyDataset(Dataset):
    def __init__(self, data, length):
        self.data = data
        self.data_len = len(self.data)
        self.len = length
        
    def __getitem__(self, index):
        data_idx = index % self.data_len
        print('index {}, data_idx {}'.format(index, data_idx))
        x = self.data[data_idx]
        return x
    
    def __len__(self):
        return self.len


data = torch.randn(10, 1)
length = 30        
dataset = MyDataset(data, length)
loader = DataLoader(dataset, batch_size=2)

for x in loader:
    print(x.shape)

Basically, you would artificially increase the number of samples by passing the length directly to the dataset and inside the __getitem__ method you could use the modulo operation to repeatedly sample from the data.
Let me know, if this would work for you.