`torchvision` v2 transforms not applying to both image and mask

This is expected behavior in the transforms.v2 API according to the docs:

If there is no Image or Video instance, only the first pure torch.Tensor will be transformed as image or video, while all others will be passed-through. Here “first” means “first in a depth-wise traversal”.

You could use the tv_tensors classes instead:

img_tv = torchvision.tv_tensors.Image(img)
mask_tv = torchvision.tv_tensors.Mask(mask)

out1, out2 = transforms(img_tv, mask_tv)
1 Like