Torchvision.transfors: How to perform identical transform on both image and target?

ptrblck · August 21, 2019, 12:31pm

You could redefine self.transform to take more images/masks and iterate over each one applying the transformations:

    def transform(self, images, masks):
        # Resize
        resize = transforms.Resize(size=(520, 520))
        for idx in range(len(images)):
            images[idx] = resize(images[idx])
            masks[idx] = resize(masks[idx])
        ...

Would that work for you?

Elsa_Lambert · August 21, 2019, 1:31pm

It works ! Thanks you very much for your help and reactivity !!

Elsa_Lambert · August 21, 2019, 2:42pm

I’ve finally got a problem when training my network.
I’ve got the following error : AttributeError: ‘list’ object has no attribute ‘to’
I think this is due to the fact that i previously turn my masks in PIL image to do the data augmentation, and finally returned it as tensors but with this shape : torch.Size([1, 256, 256])
But i think my network wants this size for my masks : torch.Size([256, 256])
Am I right ? and if yes, is there a solution to turn the shape of the masks in 2 dimensions ?
Thanks !!

ptrblck · August 21, 2019, 2:54pm

If you are returning a list of tensors, you could use masks = torch.stack(masks) to create a tensor.
However, since you are returning multiple masks for each sample, the batch dimension will be added additionally, so that inside your DataLoader loop you would get a tensor of shape [batch_size, nb_masks, height, width].
If you want to treat each mask separately as a sample, you could call view(-1, height, width) on it to put all masks into the batch dimension, but I’m not sure what your current use case is, so could you explain it a bit?

If you just want to remove the additional dimension in [1, 256, 256], use mask = mask.squeeze(0).

Elsa_Lambert · August 22, 2019, 8:57am

You’re right, masks = torch.stack(masks) is exactly what i need ! Thanks again !!

Weihao_Yuan · September 4, 2019, 3:12am

Hi, how about ColorJitter. How to do colorJitter for multiple input images like the left and right image in stereo matching? Thanks

ptrblck · September 4, 2019, 9:48am

You could get the parameters by calling the get_params method and apply this transformation on each image.

beitian · November 22, 2019, 2:00am

I‘m also encountering this problem, are you found a good solution?

edgarriba · November 22, 2019, 12:31pm

in kornia.org we are also working on an API to cover similar cases. We plan to cover a set of compatible functionalities with torchvision and other data augmentation framewroks in order to perform data augmentation using raw torch tensors, meaning that can be done in parallel using the gpu + backprop throw that. Additionally, our API has a flag whether to return the transform or concatenate the result of the transforms. See an example here of horizontal flip: https://kornia.readthedocs.io/en/latest/augmentation.html#kornia.augmentation.random_hflip

BarCodeReader · April 7, 2020, 9:13am

can we just concatenate those two image into one then apply the transform, then split them?

gt_img = [1,c, h,w]
train_img = [1,c,h,w]
cat_img = [1,2c,h,w]
apply transform on cat_img
then split the result?

SatyamGaba · August 12, 2020, 4:20pm

What if the mask does not have dimensions that PIL Image can load? for example in segmentation task number of channels in a mask can be an arbitrary number. In that case, any torchvision transformations will not work.
I have asked the question in details here.

ptrblck · August 13, 2020, 3:30am

In these cases it depends in which format the mask is stored.
If e.g. PIL cannot load it, the mask is most likely not stored in a classical “image format”, and might be instead stored as a numpy array and could thus be loaded via numpy.
Which format do your mask currently have?

SatyamGaba · August 13, 2020, 2:23pm

For each image, I have mask as numpy array(.npy). Mask dimension is (num_classes, height_image, width_image).

How can I use torchvision transforms on a numpy array?

fgauthier · August 13, 2020, 2:55pm

I do not think that you can use torchvision on a numpy array. However you can transform your numpy array to a PIL Image and then use torchvision. You can use ToPILImage() to do that.

SatyamGaba · August 13, 2020, 3:06pm

I can not convert a numpy array to PIL Image when number of channels (num_classes in np array) are not 3(converts to RGB image) or 1 (grayscale).

fgauthier · August 13, 2020, 3:07pm

What is the shape of your numpy array?

SatyamGaba · August 13, 2020, 3:16pm

numpy array shape: (10,800,800)
800 is image height and width

fgauthier · August 13, 2020, 3:19pm

You could maybe make a list of 10 numpy array with this numpy array and then cast them all in PIL Image and use the torchvision transforms on these PIL image and after convert them again in numpy array…Also I think it is really important to have a numpy array of dtype uint8

SatyamGaba · August 13, 2020, 3:25pm

That is what I am doing right now. I am converting each channel to a grayscale PIL Image, doing the transforms on PIL Image and converting back to numpy array.Then, I concatenate all the the numpy array to make a transformed mask.

The issue is that I use loop for each of the channel and it significantly slows down the dataloader. Is there a way to vectorize this or to do this efficiently?

ptrblck · August 14, 2020, 3:40am

The latest torchvision release added support for tensors in some transformations, so you could check if the transformation you are using would allow this.
If not, the vectorization depends on the type of transformation you would like to apply.
E.g. a random cropping could be performed with a single slicing operation on the tensor.