Why there is a difference between batch and individual transforms?

pytorch 2.1.2
torchvision 0.16.2

I try use v2 transforms by individual with for loop:
pp_img1 = [preprocess(image) for image in orignal_images]
and by
batch :
pp_img2 = preprocess(orignal_images)

but i found the output is different after preprocess.
Actually: pp_img1[0] and pp_img2 [0] are the same,
but pp_img1[1] and pp_img2[1] … and so on are different

Should they be the same result? Or I misunderstand some concept?
Why there is a difference between batch and individual transforms?

The full code:

import torch
from torchvision.transforms import v2
preprocess = v2.Compose([
    v2.ToTensor(),               # Convert to tensor (0, 1)
    v2.Normalize([0.5], [0.5]),  # Map to (-1, 1)
])
#individual 
pp_img1 = [preprocess(image) for image in orignal_images]
# batch
pp_img2 = preprocess(orignal_images)

Z = pp_img1 + pp_img2
show_images(torch.stack(Z))

pp_img1 + pp_img2 image:

1 Like

It seems only the first sample is transformed in v2.Normalize as seen here:

ref = [torch.randn(3, 4, 4) for _ in range(2)]

norm = v2.Normalize([0.5], [0.5])
out1 = [norm(r) for r in ref]
out2 = norm(ref)

print(torch.stack(out1) - torch.stack(out2))
# tensor([[[[ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000]],

#          [[ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000]],

#          [[ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000],
#           [ 0.0000,  0.0000,  0.0000,  0.0000]]],


#         [[[-0.2077, -2.1606, -3.0535,  0.2506],
#           [-0.1663,  0.7301, -3.8487, -0.4478],
#           [-1.8119, -0.1568, -0.2179, -2.6124],
#           [-0.6122, -1.1466, -1.0353, -0.2566]],

#          [[-0.8328, -2.4928, -1.3262, -0.8081],
#           [-0.3667, -1.4218, -1.9000, -1.0971],
#           [-1.5551,  0.6238, -1.2795, -1.7695],
#           [-0.1538, -1.3588, -1.2829, -1.9727]],

#          [[-0.6973, -0.9461, -2.5446,  1.0588],
#           [ 1.0722, -0.1794, -0.7449, -2.8416],
#           [-0.4963, -1.0180, -1.4314, -0.1204],
#           [ 0.4762, -0.5133, -0.5140, -3.4192]]]])

print(out2[1] - ref[1])
# tensor([[[0., 0., 0., 0.],
#          [0., 0., 0., 0.],
#          [0., 0., 0., 0.],
#          [0., 0., 0., 0.]],

#         [[0., 0., 0., 0.],
#          [0., 0., 0., 0.],
#          [0., 0., 0., 0.],
#          [0., 0., 0., 0.]],

#         [[0., 0., 0., 0.],
#          [0., 0., 0., 0.],
#          [0., 0., 0., 0.],
#          [0., 0., 0., 0.]]])

@pmeier do you know if this is expected and an unsupported use case?

1 Like

The issue is that you are using plain tensor as images. In such a scenario, @ptrblck is right in that only the first item is transformed for BC with v1. See the note in this section of the documentation.

What you want to do is to wrap your images in our new tensor subclasses, e.g.

from torchvision import tv_tensors

orignal_images = [tv_tensors.Image(image) for image in orignal_images]

After that there should be no difference between the two methods. Note that this step is mandatory for all inputs besides images.

Furthermore, ToTensor is deprecated. Use v2.ToImage() followed by a v2.ToDtype(dtype=torch.float32, scale=True) instead. The former will also handle the wrapping into tv_tensors.Image for you.

So basically your example will be solved by using

preprocess = v2.Compose([
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize([0.5], [0.5]),
])
2 Likes