Adding data to existing dataloader

Hi,
I create training, validation and testing data loaders for MNIST as follows:

train_set = datasets.MNIST(root=data_root, train=True, transform=transform_train, download=True)
valid_set = datasets.MNIST(root=data_root, train=True, transform=transform_test, download=False)
test_set = datasets.MNIST(root=data_root, train=False, transform=transform_test, download=False)
                          
# Split training into train and validation
train_size = 600;
val_size = 59400;
indices = torch.randperm(len(train_set))
train_indices = indices[:len(indices)-valid_size][:train_size or None]
valid_indices = indices[len(indices)-valid_size:] if valid_size else None

train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
                                           sampler=SubsetRandomSampler(train_indices), **kwargs)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, **kwargs)
if valid_size:
    valid_loader = torch.utils.data.DataLoader(valid_set, batch_size=batch_size,
                                               sampler=SubsetRandomSampler(valid_indices), **kwargs)
else:
    valid_loader = None

Now what I would like to do is to transform the training data and then add the transformed data to the existing training data to form a new training set, somehow like this:

# Now transform the training data and add the new transformed data to existing training data
for data, target in train_loader:
    t_ims = ut.transform_ims(data.numpy(), [parameters])
    t_data = torch.from_numpy(t_ims)
    
    # Concatenate
    data = [data, t_data]
    target = [target, target]
    
    # Set new training data
    train_loader.data = data
    train_loader.target = target

Could you please tell me how to do that? I find the structure of dataloader in PyTorch is really difficult to understand :frowning:
Thank you very much for your help!!

1 Like

You like to SAVE the new transformed data or just to use it in training?

Thanks. I would like just to use it for training.

If you do something like this:

transform_train = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

Then the transformed data will be automatically used in training. Is that what you wanted?
See full example here with train / test transforms.
https://github.com/QuantScientist/Deep-Learning-Boot-Camp/blob/master/day%2002%20PyTORCH%20and%20PyCUDA/PyTorch/21-PyTorch-CIFAR-10-Custom-data-loader-from-scratch.ipynb

Thanks, @QuantScientist. Is it possible to apply a more sophisticated composition of transformation (e.g blurring)? The transformation I used is something like this:

t_ims = ut.transform_ims(data.numpy(), [zoom_level, rot_angle, tx, ty, blur_sigma])

In addition, is it possible to add two or more transformations to the training data instead one one?