Torchvision.transfors: How to perform identical transform on both image and target?

Hi, how about ColorJitter. How to do colorJitter for multiple input images like the left and right image in stereo matching? Thanks

You could get the parameters by calling the get_params method and apply this transformation on each image.

I‘m also encountering this problem, are you found a good solution?

in kornia.org we are also working on an API to cover similar cases. We plan to cover a set of compatible functionalities with torchvision and other data augmentation framewroks in order to perform data augmentation using raw torch tensors, meaning that can be done in parallel using the gpu + backprop throw that. Additionally, our API has a flag whether to return the transform or concatenate the result of the transforms. See an example here of horizontal flip: https://kornia.readthedocs.io/en/latest/augmentation.html#kornia.augmentation.random_hflip

can we just concatenate those two image into one then apply the transform, then split them?

gt_img = [1,c, h,w]
train_img = [1,c,h,w]
cat_img = [1,2c,h,w]
apply transform on cat_img
then split the result?

What if the mask does not have dimensions that PIL Image can load? for example in segmentation task number of channels in a mask can be an arbitrary number. In that case, any torchvision transformations will not work.
I have asked the question in details here.

In these cases it depends in which format the mask is stored.
If e.g. PIL cannot load it, the mask is most likely not stored in a classical “image format”, and might be instead stored as a numpy array and could thus be loaded via numpy.
Which format do your mask currently have?

For each image, I have mask as numpy array(.npy). Mask dimension is (num_classes, height_image, width_image).

How can I use torchvision transforms on a numpy array?

I do not think that you can use torchvision on a numpy array. However you can transform your numpy array to a PIL Image and then use torchvision. You can use ToPILImage() to do that.

I can not convert a numpy array to PIL Image when number of channels (num_classes in np array) are not 3(converts to RGB image) or 1 (grayscale).

What is the shape of your numpy array?

numpy array shape: (10,800,800)
800 is image height and width

You could maybe make a list of 10 numpy array with this numpy array and then cast them all in PIL Image and use the torchvision transforms on these PIL image and after convert them again in numpy array…Also I think it is really important to have a numpy array of dtype uint8

That is what I am doing right now. I am converting each channel to a grayscale PIL Image, doing the transforms on PIL Image and converting back to numpy array.Then, I concatenate all the the numpy array to make a transformed mask.

The issue is that I use loop for each of the channel and it significantly slows down the dataloader. Is there a way to vectorize this or to do this efficiently?

The latest torchvision release added support for tensors in some transformations, so you could check if the transformation you are using would allow this.
If not, the vectorization depends on the type of transformation you would like to apply.
E.g. a random cropping could be performed with a single slicing operation on the tensor.

image = TF.to_tensor(image)
mask = TF.to_tensor(mask)

This transform converts image and mask values to between 0 and 1. Thus, to preserve values of image and mask, transform can be performed as:

image = torch.tensor(np.array(image))
mask = torch.tensor(np.array(mask))

I think I have a simple solution:
If the images are concatenated, the transformations are applied to all of them identically:

import torch
import torchvision.transforms as T

# Create two fake images (identical for test purposes):
image = torch.randn((3, 128, 128))
target = image.clone()

# This is the trick (concatenate the images):
both_images = torch.cat((image.unsqueeze(0), target.unsqueeze(0)),0)

# Apply the transformations to both images simultaneously:
transformed_images = T.RandomRotation(180)(both_images)

# Get the transformed images:
image_trans = transformed_images[0]
target_trans = transformed_images[1]

# Compare the transformed images:
torch.all(image_trans == target_trans).item()

>> True

Hi @ptrblck,

Can you please help me with how we can apply image normalization like your example?
As it looks like we cannot apply transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) in this way, like your example? I mean we cannot use
image = transforms.Normalize(image, mean,std)?
(I need something like: norm_image = Normalize(image, mean, std)).

transforms.Normalize(mean, std) will create a transformation object which you could then call directly (similar to any module you are creating e.g. nn.Linear):

transform = transforms.Normalize(mean, std)
output = transform(input)

If you don’t want to create an object first and apply it later, but directly apply the transformation you could use the functional API (similar to the nn.functional API e.g. via F.linear):

output = transforms.functional.normalize(input, mean, std)
1 Like

Many thanks @ptrblck, yes I need to apply that directly.