Concatenating images and passing through the model

What is the difference between concatenating two image and passing it through resnet network ,while passing both images individually to the network. Code 1 for 1st scenario and code 2 for second.

Code 1

images=[x1,x2]
images = torch.cat([images[0], images[1]], dim=0)
bsz = labels.shape[0]
features = model(images)
f1, f2 = torch.split(features, [bsz, bsz], dim=0)
features = torch.cat([f1.unsqueeze(1), f2.unsqueeze(1)], dim=1)

Code 2

x1 and x2 images
out_1 = model(x1)
out_2 = model(x2)
out = torch.cat([out_1, out_2], dim=0)

Note : The output dimension from the model will be 128

Technically there should be no difference but it looks like in code 1, you are doing the concatenation at dim=0. This could cause issues,

Say two image dims are,

x1 -> 32x32x3
x2 -> 32x32x3

Doing code 1 will provide the following dims,

images -> 64x32x3

You need to unsqueeze it first at dim=0. This would make the dims as follows,

x1 -> 1x32x32x3
x2 -> 1x32x32x3
images -> 2x32x32x3

This wont be an issue in code 2 as the images are handled individually. Let me know if that’s the case.

x1 -> [256, 3, 32, 32]
x2 -> [256, 3, 32, 32]
images -> [256, 3, 32, 32]

Here 256 is the batch size

features->[256, 2, 128]

I think in code 1, you are concatenating 2 batches of Images, so if there is enough RAM, than it will work fine.
In second code you are running model twice, so more time for same work.

Here only thing to take care is whether you have the capacity to sent 256 Batch Size to model or not, if yes than you can use code1 which will run all 256 images in parallel.

1 Like