Why (h,w) in Resize?

Hi, I love the pytorch dataloaders and transforms, and have been using them for images.

But one thing I’m a little puzzled by is why the Resize function takes in the argument as (height, width) instead of (width, height) as seems to be the convention for PIL and cv2? I was thoroughly confused by my code’s behavior until I finally opened up the doc and saw that it is clearly stated there that the argument is (h,w). It was an easy fix, but it left me wondering why the convention seems flipped for pytorch.


Which Resize function are you looking at?

My guess is that (height, width) is more like (rows, columns).

This one:

I guess rows x columns makes sense, but yet, that isn’t the standard from what I see in other image packages.

This was changed in torchvision 0.2.0 to be consistent with other transformations (source).

Converting a PIL.Image to a numpy array gives you [H, W, C]. Just a guess, but maybe it’s cleaner to work with height before width?

Ok, makes sense that there is consistency across transformations. Thanks.