Torchvision.transfors: How to perform identical transform on both image and target?

You could redefine self.transform to take more images/masks and iterate over each one applying the transformations:

    def transform(self, images, masks):
        # Resize
        resize = transforms.Resize(size=(520, 520))
        for idx in range(len(images)):
            images[idx] = resize(images[idx])
            masks[idx] = resize(masks[idx])
        ...

Would that work for you?

2 Likes

It works ! Thanks you very much for your help and reactivity !!

4 Likes

I’ve finally got a problem when training my network.
I’ve got the following error : AttributeError: ‘list’ object has no attribute ‘to’
I think this is due to the fact that i previously turn my masks in PIL image to do the data augmentation, and finally returned it as tensors but with this shape : torch.Size([1, 256, 256])
But i think my network wants this size for my masks : torch.Size([256, 256])
Am I right ? and if yes, is there a solution to turn the shape of the masks in 2 dimensions ?
Thanks !!

If you are returning a list of tensors, you could use masks = torch.stack(masks) to create a tensor.
However, since you are returning multiple masks for each sample, the batch dimension will be added additionally, so that inside your DataLoader loop you would get a tensor of shape [batch_size, nb_masks, height, width].
If you want to treat each mask separately as a sample, you could call view(-1, height, width) on it to put all masks into the batch dimension, but I’m not sure what your current use case is, so could you explain it a bit? :slight_smile:

If you just want to remove the additional dimension in [1, 256, 256], use mask = mask.squeeze(0).

1 Like

You’re right, masks = torch.stack(masks) is exactly what i need ! Thanks again !! :smiley:

1 Like

Hi, how about ColorJitter. How to do colorJitter for multiple input images like the left and right image in stereo matching? Thanks

You could get the parameters by calling the get_params method and apply this transformation on each image.

I‘m also encountering this problem, are you found a good solution?

in kornia.org we are also working on an API to cover similar cases. We plan to cover a set of compatible functionalities with torchvision and other data augmentation framewroks in order to perform data augmentation using raw torch tensors, meaning that can be done in parallel using the gpu + backprop throw that. Additionally, our API has a flag whether to return the transform or concatenate the result of the transforms. See an example here of horizontal flip: https://kornia.readthedocs.io/en/latest/augmentation.html#kornia.augmentation.random_hflip

can we just concatenate those two image into one then apply the transform, then split them?

gt_img = [1,c, h,w]
train_img = [1,c,h,w]
cat_img = [1,2c,h,w]
apply transform on cat_img
then split the result?

What if the mask does not have dimensions that PIL Image can load? for example in segmentation task number of channels in a mask can be an arbitrary number. In that case, any torchvision transformations will not work.
I have asked the question in details here.

In these cases it depends in which format the mask is stored.
If e.g. PIL cannot load it, the mask is most likely not stored in a classical “image format”, and might be instead stored as a numpy array and could thus be loaded via numpy.
Which format do your mask currently have?

For each image, I have mask as numpy array(.npy). Mask dimension is (num_classes, height_image, width_image).

How can I use torchvision transforms on a numpy array?

I do not think that you can use torchvision on a numpy array. However you can transform your numpy array to a PIL Image and then use torchvision. You can use ToPILImage() to do that.

I can not convert a numpy array to PIL Image when number of channels (num_classes in np array) are not 3(converts to RGB image) or 1 (grayscale).

What is the shape of your numpy array?

numpy array shape: (10,800,800)
800 is image height and width

You could maybe make a list of 10 numpy array with this numpy array and then cast them all in PIL Image and use the torchvision transforms on these PIL image and after convert them again in numpy array…Also I think it is really important to have a numpy array of dtype uint8

That is what I am doing right now. I am converting each channel to a grayscale PIL Image, doing the transforms on PIL Image and converting back to numpy array.Then, I concatenate all the the numpy array to make a transformed mask.

The issue is that I use loop for each of the channel and it significantly slows down the dataloader. Is there a way to vectorize this or to do this efficiently?

The latest torchvision release added support for tensors in some transformations, so you could check if the transformation you are using would allow this.
If not, the vectorization depends on the type of transformation you would like to apply.
E.g. a random cropping could be performed with a single slicing operation on the tensor.