Pytorch tranforms give me weird results

sidd.suresh97 · February 12, 2020, 10:41pm

tfms = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    #transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

I’m trying to resize my image to 224,224 and then centre crop it. Here is the original image and the image after the transform. Why are there 9 images after the transform is applied?

Image after the transform:-
Screenshot 2020-02-11 at 8.06.02 PM

sidd.suresh97 · February 12, 2020, 10:43pm

Image before the transform

ptrblck · February 13, 2020, 12:43am

Could you print the shape and type of the image?
The result looks interleaved, which would happen, if you e.g. use a view instead of permute to swap the axes.

sidd.suresh97 · February 13, 2020, 4:56am

This is the code:-

tfms = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    #transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    object_img_tensor = tfms(PIL.Image.fromarray(object_image))
    context_img_tensor = tfms(PIL.Image.fromarray(context_image))

Type of image before tranform is applied :- <class ‘numpy.ndarray’>
Shape of the image before tranform is applied :- (480, 640, 3)
Type of image after tranform is applied :- <class ‘torch.Tensor’>
Shape of the image after tranform is applied :- torch.Size([3, 224, 224])

ptrblck · February 13, 2020, 5:07am

The shapes look alright.
Could you post the code you are using to visualize the image?

sidd.suresh97 · February 13, 2020, 9:44am

plt.imshow(object_img_tensor.reshape(224,224,3))
plt.show()

vgsprasad · February 13, 2020, 11:51am

Data was interleaved with ‘reshape’ function. Instead of ‘reshape’ function, use ‘permute’ function to shuffle the dimensions. Do like this

plt.imshow(object_img_tensor.permute(1,2,0))
plt.show()

sidd.suresh97 · February 13, 2020, 3:39pm

Ok this works. I’m just curious as to how the reshape function interleaves the data?

vgsprasad · February 14, 2020, 11:49am

Let X be an input array of dimension [3][224][224]. When X is reshaped, it is first flattened as X[0][0][[0], X[0][0][1],…,X[0][0][223],X[0][1][0],…,X[0][223][223],X[1][0][0],…,X[2][223][223]. After that 3 elements are taken as group and forms an output array Y of dimension [224][224][3]. Hence, the output pixels are as Y[0][0][0] = X[0][0][0], Y[0][0][1] = X[0][0][1], Y[0][0][2] = X[0][0][2], Y[0][1][0] = X[0][0][3], …, Y[223][223][2] = X[2][223][223].