How to change an image which has dimensions (512, 512, 3) to a tensor of size ([1, 3, 224, 224])?

e-cockroach · April 15, 2022, 6:05am

Hi guys,
I was trying to implement a paper where the input dimensions are meant to be a tensor of size ([1, 3, 224, 224]). My current image size is (512, 512, 3).

How do I resize and convert in order to input to the model?

Any help will be much appreciated. Thanks!

ptrblck · April 15, 2022, 6:08am

First you would have to permute the dimensions to create a channels-first tensor from the channels-last input and could then use torchvision.transforms.Resize to resize the tensor to the desired 224x224 spatial size:

input = torch.randn(512, 512, 3)
x = input.permute(2, 0, 1)
print(x.shape)
# torch.Size([3, 512, 512])

transform = transforms.Resize((224, 224))
out = transform(x)
print(out.shape)
# torch.Size([3, 224, 224])

e-cockroach · April 15, 2022, 6:16am

Thanks! It helps.
I just had a follow up:
This is what was mentioned in the paper implementation: make sure the dimension of image is 3 and 4th dimension is batch dim.
So how do I add batch dimension to the dimension?
(I am very sorry if question is lame, but I am new to pytorch and learning a bit by implementing)

ptrblck · April 15, 2022, 6:20am

I don’t know which paper you are reading, but

make sure the dimension of image is 3 and 4th dimension is batch dim.

sounds strange.
The dimension of the image tensor would be 4 if you add a batch dimension, so I guess the first part targets the input image without the batch dimension?
The batch dimension is added in dim0 be default (in all frameworks if I’m not mistaken).
For your tensor, you could use:

out = out.unsqueeze(0)

to add the batch dimension which would then create out in the shape [1, 3, 224, 224].
I don’t know why one would add the batch dim into the 4th dim, but the paper might be working on a special use case where dimensions might be used in an unconventional way.

e-cockroach · April 15, 2022, 6:25am

Thanks I think it should do the job!
Also while I was converting from image to tensor using this:

input = sample_f_img # sample_f_img is the input image
convert_tensor = transforms.ToTensor()

input=convert_tensor(input)

and then adding your solution to this,
I am getting a weird size: torch.Size([1, 512, 224, 224])
whereas the expected is meant to be: torch.Size([1, 3, 224, 224])

I was wondering if you had any possible explanation for why this happened? Or is my image to tensor conversion not done the right way?

ptrblck · April 15, 2022, 6:27am

Could you post the dtype and shape of input?

e-cockroach · April 15, 2022, 6:31am

Sure,

dtype=uint8
shape=(512, 512, 3)

ptrblck · April 15, 2022, 6:34am

Thanks! It seems to work for me:

input = np.random.randint(0, 256, (512, 512, 3), dtype=np.uint8)
out = transforms.ToTensor()(input)
print(out.shape)
# torch.Size([3, 512, 512])