Torchvision.transfors: How to perform identical transform on both image and target?

You might want to try batchgenerators from the MIC-DKFZ for your 3d Data. It also works multiprocessed and natively supports 2D and 3D.

Also, in case people are not aware, the albumentations project allows you to do pairwise transforms of all kinds on source + target.


Hello, @ptrblck, if applying the crop and flip transforms on 3d data like DxHxW(stored in tensor), is there any better way than applying the transforms separatly on each slice? Because the transforms can only working on PIL images, if I convert every slice to PIL image and taking the transformations then convert it back to tensors, it will be too much work.

I love working with the imgaug library for image augmentation. Check this example out!

i tried this code in my data set " images and labels are images" but i am having this error FileNotFoundError: [Errno 2] No such file or directory: ‘E’

1 Like

Try to print the path before loading the data or mask image.
Did you pass a list of paths or just a single string?

i pass them like this train_dataset = MyDataset(…/train/images , …/train/labels)

You should pass a list of image and label paths to it:

image_paths = ['./data/0.png', './data/1.png']
target_paths = ['./target/0.png', './target/1.png']
dataset = MyDataset(image_paths, target_paths)

thank you @ptrblck , but is if i have 100 images i have to write their paths in this way ?

You would use some utility functions from glob or os to get all the paths.
Just make sure the image and target paths correspond to each other, i.e. avoid just sorting one list etc.

thank you @ptrblck
i did it like this

class MyDataset(data.Dataset):
    def __init__(self, image_paths, target_paths, train=True):
        self.image_paths = image_paths
        self.target_paths = target_paths
        self.files = os.listdir(self.image_paths)
        self.lables = os.listdir(self.target_paths)

    def transform(self, image, mask):
        # Resize
        resize = transforms.Resize(size=(520, 520))
        image = resize(image)
        mask = resize(mask)

        # Random crop
        i, j, h, w = transforms.RandomCrop.get_params(
            image, output_size=(512, 512))
        image = TF.crop(image, i, j, h, w)
        mask = TF.crop(mask, i, j, h, w)

        # Random horizontal flipping
        if random.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(mask)

        # Random vertical flipping
        if random.random() > 0.5:
            image = TF.vflip(image)
            mask = TF.vflip(mask)

        # Transform to tensor
        image = TF.to_tensor(image)
        mask = TF.to_tensor(mask)
        return image, mask
    def __len__(self):
        return len(self.image_paths)
    def __getitem__(self,idx):
        img_name = self.files[idx]
        label_name = self.lables[idx]
        image =,img_name))
        mask =,label_name))
        x, y = self.transform(image, mask)
        return x, y

it works fine for dataloder but when i train the model i am getting this error

IndexError: list index out of range

self.image_paths seems to be the path to the folder containing the images.
Return len(self.files) in def __len__(self).

1 Like

Hi, Thanks, ptrbick. Is it possible to give me a small example to do randomly rotation for both input image and target input?

Sorry if I’m a bit late to reply here but if you’re trying to load image pairs (e.g. for segmentation purposes) you may want to look at the code in pywick’s FolderDataset. It will let you specify a root and relative target directories and will auto-map your source images to your target images. You can also specify co-transforms on your images so that both source and target tensors are manipulated in the same way and don’t get out of sync. Using the full framework just to accomplish that one task would be overkill but you can probably just yank the FolderDataset class to use in your own application.

For full disclosure, I’m the owner of the pywick framework.

Thanks a lot for all the input.

So far I only saw scattered API extensions, none of which is explicitly documented and can handle all data input.

Can we have a “ground truth” standard on this?

Isn’t that exactly the problem though? There are so many use cases that it’s not quite as simple as handling “all data input”. For academic datasets like COCO etc pytorch already provides loaders (as do many other people). The trick is to enable data loading that supports other use cases. I’ve tried to address some of them in my implementation and make it generic enough to be useful in many situations but it will never fit everyone (e.g. a 3-d sliced dataset would have totally different requirements). So I would say the “ground truth” is different for everyone.

i am making a dataset for training a siamese netwrok, i which i have to a pair of images in the forward pass(anchor image and positive /negative image). I have to torchvision transforms on the pair of images, but i am not able to do this.

Any suggestions?

Thanks in advance :slight_smile:

Are the suggestions given in this topic (e.g. this) not working for your use case?

it helped. thanks :slight_smile:

Hi everyone !

I tried what @ptrblck have done on this post: Torchvision.transfors: How to perform identical transform on both image and target?
But my problem is that it seems that self.transform allows only one mask/target, but I have several masks for each image.
How can I resolve it ?

I can provide my code if needed.

Thanks ! :slight_smile: