How to change an image which has dimensions (512, 512, 3) to a tensor of size ([1, 3, 224, 224])?

Hi guys,
I was trying to implement a paper where the input dimensions are meant to be a tensor of size ([1, 3, 224, 224]). My current image size is (512, 512, 3).

How do I resize and convert in order to input to the model?

Any help will be much appreciated. Thanks!

First you would have to permute the dimensions to create a channels-first tensor from the channels-last input and could then use torchvision.transforms.Resize to resize the tensor to the desired 224x224 spatial size:

input = torch.randn(512, 512, 3)
x = input.permute(2, 0, 1)
# torch.Size([3, 512, 512])

transform = transforms.Resize((224, 224))
out = transform(x)
# torch.Size([3, 224, 224])
1 Like

Thanks! It helps.
I just had a follow up:
This is what was mentioned in the paper implementation: make sure the dimension of image is 3 and 4th dimension is batch dim.
So how do I add batch dimension to the dimension?
(I am very sorry if question is lame, but I am new to pytorch and learning a bit by implementing)

I don’t know which paper you are reading, but

make sure the dimension of image is 3 and 4th dimension is batch dim.

sounds strange.
The dimension of the image tensor would be 4 if you add a batch dimension, so I guess the first part targets the input image without the batch dimension?
The batch dimension is added in dim0 be default (in all frameworks if I’m not mistaken).
For your tensor, you could use:

out = out.unsqueeze(0)

to add the batch dimension which would then create out in the shape [1, 3, 224, 224].
I don’t know why one would add the batch dim into the 4th dim, but the paper might be working on a special use case where dimensions might be used in an unconventional way.

Thanks I think it should do the job!
Also while I was converting from image to tensor using this:

input = sample_f_img # sample_f_img is the input image
convert_tensor = transforms.ToTensor()


and then adding your solution to this,
I am getting a weird size: torch.Size([1, 512, 224, 224])
whereas the expected is meant to be: torch.Size([1, 3, 224, 224])

I was wondering if you had any possible explanation for why this happened? Or is my image to tensor conversion not done the right way?

Could you post the dtype and shape of input?


shape=(512, 512, 3)

Thanks! It seems to work for me:

input = np.random.randint(0, 256, (512, 512, 3), dtype=np.uint8)
out = transforms.ToTensor()(input)
# torch.Size([3, 512, 512])
1 Like