Pytorch tranforms give me weird results

tfms = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    #transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

I’m trying to resize my image to 224,224 and then centre crop it. Here is the original image and the image after the transform. Why are there 9 images after the transform is applied?

Image after the transform:-
Screenshot 2020-02-11 at 8.06.02 PM

Image before the transform

Could you print the shape and type of the image?
The result looks interleaved, which would happen, if you e.g. use a view instead of permute to swap the axes.

This is the code:-

tfms = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    #transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    object_img_tensor = tfms(PIL.Image.fromarray(object_image))
    context_img_tensor = tfms(PIL.Image.fromarray(context_image))
  
  

Type of image before tranform is applied :- <class ‘numpy.ndarray’>
Shape of the image before tranform is applied :- (480, 640, 3)
Type of image after tranform is applied :- <class ‘torch.Tensor’>
Shape of the image after tranform is applied :- torch.Size([3, 224, 224])

The shapes look alright.
Could you post the code you are using to visualize the image?

plt.imshow(object_img_tensor.reshape(224,224,3))
plt.show()

Data was interleaved with ‘reshape’ function. Instead of ‘reshape’ function, use ‘permute’ function to shuffle the dimensions. Do like this

plt.imshow(object_img_tensor.permute(1,2,0))
plt.show()

Ok this works. I’m just curious as to how the reshape function interleaves the data?

Let X be an input array of dimension [3][224][224]. When X is reshaped, it is first flattened as X[0][0][[0], X[0][0][1],…,X[0][0][223],X[0][1][0],…,X[0][223][223],X[1][0][0],…,X[2][223][223]. After that 3 elements are taken as group and forms an output array Y of dimension [224][224][3]. Hence, the output pixels are as Y[0][0][0] = X[0][0][0], Y[0][0][1] = X[0][0][1], Y[0][0][2] = X[0][0][2], Y[0][1][0] = X[0][0][3], …, Y[223][223][2] = X[2][223][223].