Torchvision.transfors: How to perform identical transform on both image and target?

Weihao_Yuan · September 4, 2019, 3:12am

Hi, how about ColorJitter. How to do colorJitter for multiple input images like the left and right image in stereo matching? Thanks

ptrblck · September 4, 2019, 9:48am

You could get the parameters by calling the get_params method and apply this transformation on each image.

beitian · November 22, 2019, 2:00am

I‘m also encountering this problem, are you found a good solution?

edgarriba · November 22, 2019, 12:31pm

in kornia.org we are also working on an API to cover similar cases. We plan to cover a set of compatible functionalities with torchvision and other data augmentation framewroks in order to perform data augmentation using raw torch tensors, meaning that can be done in parallel using the gpu + backprop throw that. Additionally, our API has a flag whether to return the transform or concatenate the result of the transforms. See an example here of horizontal flip: https://kornia.readthedocs.io/en/latest/augmentation.html#kornia.augmentation.random_hflip

BarCodeReader · April 7, 2020, 9:13am

can we just concatenate those two image into one then apply the transform, then split them?

gt_img = [1,c, h,w]
train_img = [1,c,h,w]
cat_img = [1,2c,h,w]
apply transform on cat_img
then split the result?

SatyamGaba · August 12, 2020, 4:20pm

What if the mask does not have dimensions that PIL Image can load? for example in segmentation task number of channels in a mask can be an arbitrary number. In that case, any torchvision transformations will not work.
I have asked the question in details here.

ptrblck · August 13, 2020, 3:30am

In these cases it depends in which format the mask is stored.
If e.g. PIL cannot load it, the mask is most likely not stored in a classical “image format”, and might be instead stored as a numpy array and could thus be loaded via numpy.
Which format do your mask currently have?

SatyamGaba · August 13, 2020, 2:23pm

For each image, I have mask as numpy array(.npy). Mask dimension is (num_classes, height_image, width_image).

How can I use torchvision transforms on a numpy array?

fgauthier · August 13, 2020, 2:55pm

I do not think that you can use torchvision on a numpy array. However you can transform your numpy array to a PIL Image and then use torchvision. You can use ToPILImage() to do that.

SatyamGaba · August 13, 2020, 3:06pm

I can not convert a numpy array to PIL Image when number of channels (num_classes in np array) are not 3(converts to RGB image) or 1 (grayscale).

fgauthier · August 13, 2020, 3:07pm

What is the shape of your numpy array?

SatyamGaba · August 13, 2020, 3:16pm

numpy array shape: (10,800,800)
800 is image height and width

fgauthier · August 13, 2020, 3:19pm

You could maybe make a list of 10 numpy array with this numpy array and then cast them all in PIL Image and use the torchvision transforms on these PIL image and after convert them again in numpy array…Also I think it is really important to have a numpy array of dtype uint8

SatyamGaba · August 13, 2020, 3:25pm

That is what I am doing right now. I am converting each channel to a grayscale PIL Image, doing the transforms on PIL Image and converting back to numpy array.Then, I concatenate all the the numpy array to make a transformed mask.

The issue is that I use loop for each of the channel and it significantly slows down the dataloader. Is there a way to vectorize this or to do this efficiently?

ptrblck · August 14, 2020, 3:40am

The latest torchvision release added support for tensors in some transformations, so you could check if the transformation you are using would allow this.
If not, the vectorization depends on the type of transformation you would like to apply.
E.g. a random cropping could be performed with a single slicing operation on the tensor.

Siddharth_Shrivasta2 · July 30, 2021, 1:04pm

image = TF.to_tensor(image)
mask = TF.to_tensor(mask)

This transform converts image and mask values to between 0 and 1. Thus, to preserve values of image and mask, transform can be performed as:

image = torch.tensor(np.array(image))
mask = torch.tensor(np.array(mask))

Mario_Galindo · July 24, 2022, 6:41pm

I think I have a simple solution:
If the images are concatenated, the transformations are applied to all of them identically:

import torch
import torchvision.transforms as T

# Create two fake images (identical for test purposes):
image = torch.randn((3, 128, 128))
target = image.clone()

# This is the trick (concatenate the images):
both_images = torch.cat((image.unsqueeze(0), target.unsqueeze(0)),0)

# Apply the transformations to both images simultaneously:
transformed_images = T.RandomRotation(180)(both_images)

# Get the transformed images:
image_trans = transformed_images[0]
target_trans = transformed_images[1]

# Compare the transformed images:
torch.all(image_trans == target_trans).item()

>> True

sepid_kh · October 4, 2022, 3:35am

Hi @ptrblck,

Can you please help me with how we can apply image normalization like your example?
As it looks like we cannot apply transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) in this way, like your example? I mean we cannot use
image = transforms.Normalize(image, mean,std)?
(I need something like: norm_image = Normalize(image, mean, std)).

ptrblck · October 4, 2022, 4:04am

transforms.Normalize(mean, std) will create a transformation object which you could then call directly (similar to any module you are creating e.g. nn.Linear):

transform = transforms.Normalize(mean, std)
output = transform(input)

If you don’t want to create an object first and apply it later, but directly apply the transformation you could use the functional API (similar to the nn.functional API e.g. via F.linear):

output = transforms.functional.normalize(input, mean, std)

sepid_kh · October 4, 2022, 5:09am

Many thanks @ptrblck, yes I need to apply that directly.