How to modify and use a data loader?

jarvico · November 17, 2020, 1:29pm

Hi,

I need to use a modified version of data loader in my study.

Assume that I have a basic train loader like this:

train_data = datasets.MNIST(root='../../Data', train=True, download=False, transform=transforms.ToTensor())

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=False)

First I use it in the beginning.

But then for a different task, I need to add a noise to all samples in train dataset. And then I should be able use the noisy data, using a new data loader.

I can update all samples with noise using below code, but I don’t know how to save that modified train dataset as a data loader and use it later in different code sections.

def noise(x, eps, clip_min, clip_max):
    eta = torch.FloatTensor(*x.shape).normal_(mean=0,std=eps).to(x.device)
    adv_x = x + eta
    if clip_min is not None and clip_max is not None:
        adv_x = torch.clamp(adv_x, min=clip_min, max=clip_max)
    return adv_x


my_noisy_train_loader = train_loader

for i, (image,label) in enumerate(my_noisy_train_loader):
    image = noise(image,0.3,0,1)
   #How to update noisy train loader?

Could you please suggest how can I modify the data loader and use afterwards?

Or;
Maybe I need to create a custom modified dataset(noisy MNIST dataset lets say), and load this new modified dataset using a new data loader, but I also do not know how to modify and save datasets.MNIST so that I can use it later on.

albanD · November 17, 2020, 2:47pm

Hi,

I think that the simplest solution is to create a custom transform that adds this noise. You can them pass that transform when you create your Dataset and keep using the Dataloader as before.

jarvico · November 17, 2020, 2:55pm

Thanks for your prompt reply.

The point is I will not only need noise dataset, but later I will also use perturbed dataset. For example all the data in dataset will need to be perturbed with different adversarial attack types like FGSM, BIM, CW etc.

Thats why I thought that I can create a custom dataset for all different cases like noisy_MNIST_dataset , BIM_MNIST dataset etc.

I can not meet this need this using transform…

Wesley_Neill · November 17, 2020, 2:55pm

To just add to what @albanD mentioned, here is an excellent resource for custom datasets and transforms:

Writing Custom Datasets, DataLoaders and Transforms — PyTorch Tutorials 1.6.0 documentation

albanD · November 17, 2020, 3:02pm

If you want these to be “permanent” datasets, then you might be able to modify the dataset and save it again on disk. You can later load it again as a TensorDataset (for mnist it shouldn’t be too large).

jarvico · November 17, 2020, 3:17pm

Actually i do not know how to modify and save as a new dataset. Is it possible to share some piece of code to guide me on this?

Wesley_Neill · November 17, 2020, 3:19pm

Is there any reason that these perturbations can’t be in the form of a custom transform? As far as I know, what you are describing is the exact use case of transforms.

albanD · November 17, 2020, 3:29pm

The mnist dataset is actually a file with two Tensors, one for images and one for labels. You can see how it’s loaded here: https://github.com/pytorch/vision/blob/74de51d6d478e289135d9274e6af550a9bfba137/torchvision/datasets/mnist.py#L89

So you can just modify these two Tensors and save them again.

Deeply · November 18, 2020, 4:15pm

In order to have flexibility to play with the noise, I wouldn’t save a noise-added dataset.
As for your case where you want to add more perturbations, that transform approach also works here. You can have several conditional transforms if needed, ie to add or not add noise, followed by other perturbations as needed. Here’s one example from one of my projects:

train_transform = transforms.Compose([
            ImageThinning(p = cf.thinning_threshold) if cf.thinning_threshold < 1 else NoneTransform(),            
            OverlayImage(cf) if cf.overlay_image else NoneTransform(), # Add random image background here, to mimic painting            
            PadImage((cf.MAX_IMAGE_WIDTH, cf.MAX_IMAGE_HEIGHT)) if cf.pad_images else NoneTransform(),            
            transforms.Resize(cf.input_size) if cf.resize_images else NoneTransform(),            
            transforms.ToTensor(),            
            transforms.Lambda(lambda x: x.repeat(3, 1, 1))  if not(RGB_img)  else NoneTransform(), # in case the image has a single channel 
            transforms.Normalize( mu, std ) if cf.normalize_images else NoneTransform(),                                   
            ])

ImageThinning, OverlayImage, and, PadImage are custom transforms.

You still need to code your customized transforms (if you can’t find one from PyTorch that works for you)… Here’s an example for the (custom) ImageThinning transform shown above.

class ImageThinning(object):
    """  Image input as PIL and output as PIL
        To be used as part of  torchvision.transforms
       Args: p, a threshold value to control image thinning        
    """
    def __init__(self, p = 0.2):
        self.p = p                  
        
    def __call__(self, image):
        image = image_thinning(image, self.p)         
        return image

The other approach (as Wesley mentioned) is to define a custom dataset class and have train_data = datasets.MNIST... as the object of that class, and then, play with the noise or any other perturbations at the __getitem__ level.

Either ways, you’d have to get your hands dirty with coding. Good luck!