How to resize and pad in a torchvision.transforms.Compose()?

nansravn · March 3, 2020, 2:38pm

I’m creating a torchvision.datasets.ImageFolder() data loader, adding torchvision.transforms steps for preprocessing each image inside my training/validation datasets.

My main issue is that each image from training/validation has a different size (i.e.: 224x400, 150x300, 300x150, 224x224 etc). Since the classification model I’m training is very sensitive to the shape of the object in the image, I can’t make a simple torchvision.transforms.Resize(), I need to use padding to maintain the proportion of the objects.

Is there a simple way to add a padding step into a torchvision.transforms.Compose() pipeline (ensuring that every image is 224x224, without cropping the image, only doing a resize and padding)? *each image has a different original shape/size

data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((224,224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((224,224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = "./my_data_dir/"
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=16,
                                             shuffle=True, num_workers=8)
              for x in ['train', 'val']}

Reading the torchvision.transforms.Pad() documentation, I’ve understood that I need to know the size of the padding beforehand, and know if it will be applied on left/right or top/bottom, before using this transform.

Is there a simple way to add a step on this transforms.Compose() to infer the image size, so I can get the parameters that I need to configure my torchvision.transforms.Pad()?

satrya-sabeni · March 3, 2020, 11:15pm

I think I did something similar where I kept all ratios by making the each making all the images the width, and height of the biggest image. Then set the image in the center and pad the empty spaces. I don’t know if there’s an function that does this automatically but I did it myself. but I made functions like this in my data class, which works for me. After this you should be able resize and keep the ratios with the padding.

def get_padding(image):
    max_w = 1203 
    max_h = 1479
    
    imsize = image.size
    h_padding = (max_w - imsize[0]) / 2
    v_padding = (max_h - imsize[1]) / 2
    l_pad = h_padding if h_padding % 1 == 0 else h_padding+0.5
    t_pad = v_padding if v_padding % 1 == 0 else v_padding+0.5
    r_pad = h_padding if h_padding % 1 == 0 else h_padding-0.5
    b_pad = v_padding if v_padding % 1 == 0 else v_padding-0.5
    
    padding = (int(l_pad), int(t_pad), int(r_pad), int(b_pad))
    
    return padding

def pad_image(self, image):
        padded_im = pad(image, get_padding(image)) # torchvision.transforms.functional.pad
        return padded_im

Note: I just wrote this only to make it work, so it’s not optimal or ‘ideal way’ but it works for me.

nansravn · March 4, 2020, 8:52pm

Thanks @satrya-sabeni, based on your recommendation, I’ve prepared the following torchvision.transforms.Compose() pipeline:

from torchvision.transforms.functional import pad
from torchvision import transforms
import numpy as np
import numbers

def get_padding(image):    
    w, h = image.size
    max_wh = np.max([w, h])
    h_padding = (max_wh - w) / 2
    v_padding = (max_wh - h) / 2
    l_pad = h_padding if h_padding % 1 == 0 else h_padding+0.5
    t_pad = v_padding if v_padding % 1 == 0 else v_padding+0.5
    r_pad = h_padding if h_padding % 1 == 0 else h_padding-0.5
    b_pad = v_padding if v_padding % 1 == 0 else v_padding-0.5
    padding = (int(l_pad), int(t_pad), int(r_pad), int(b_pad))
    return padding

class NewPad(object):
    def __init__(self, fill=0, padding_mode='constant'):
        assert isinstance(fill, (numbers.Number, str, tuple))
        assert padding_mode in ['constant', 'edge', 'reflect', 'symmetric']

        self.fill = fill
        self.padding_mode = padding_mode
        
    def __call__(self, img):
        """
        Args:
            img (PIL Image): Image to be padded.

        Returns:
            PIL Image: Padded image.
        """
        return F.pad(img, get_padding(img), self.fill, self.padding_mode)
    
    def __repr__(self):
        return self.__class__.__name__ + '(padding={0}, fill={1}, padding_mode={2})'.\
            format(self.fill, self.padding_mode)

data_transforms = {
    'train': transforms.Compose([
        NewPad(),
        transforms.Resize((224,224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        NewPad(),
        transforms.Resize((224,224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

mcPytorch · April 1, 2020, 9:40am

import torch.functional as F
F.pad

has been moved to

from torchvision.transforms.functional import pad

as far as I can see.

Could you explain the % 1 and +0.5 part of your code snippet?
Isn’t % 1 always 0 ?

EDIT: Wait, I can see it. the / 2 division is not returning an integer but a float, which results in 0.5 values for image shape length non divisible by 2

weisunding · July 24, 2020, 3:20pm

This post saved my day, thanks! For simplicity, forget about the odd/even value of the image width/height is OK, the simplified code.

import torchvision.transforms.functional as F

class SquarePad:
	def __call__(self, image):
		w, h = image.size
		max_wh = np.max([w, h])
		hp = int((max_wh - w) / 2)
		vp = int((max_wh - h) / 2)
		padding = (hp, vp, hp, vp)
		return F.pad(image, padding, 0, 'constant')

# now use it as the replacement of transforms.Pad class
transform=transforms.Compose([
    SquarePad(),
    transforms.Resize(image_size),
    transforms.CenterCrop(image_size),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

MaiHai · November 17, 2020, 7:32am

Your SquarePad class works perfectly. Thank you!!

MaiAnd · January 12, 2021, 10:03pm

Mmmm, what I cannot fully understand is: the variable image_size is missing; how should I deal with the problem?

1chimaruGin · January 13, 2021, 10:18am

The above code snippet will pad to the maximum of width or height value of the image.
e.g.
input image - 180x240
resulting image will be - padded 240x240

Original image: cat

Padded image: padcat

ntomita · July 13, 2021, 8:56pm

I would extend the @weisunding 's code to be more precise as follows.

import torchvision.transforms.functional as F

class SquarePad:
    def __call__(self, image):
        max_wh = max(image.size)
        p_left, p_top = [(max_wh - s) // 2 for s in image.size]
        p_right, p_bottom = [max_wh - (s+pad) for s, pad in zip(image.size, [p_left, p_top])]
        padding = (p_left, p_top, p_right, p_bottom)
        return F.pad(image, padding, 0, 'constant')

target_image_size = (224, 224)  # as an example
# now use it as the replacement of transforms.Pad class
transform=transforms.Compose([
    SquarePad(),
    transforms.Resize(target_image_size),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

The original code fails to return a square image when either side is odd. The code above guarantees to return a square image by computing residuals (See a line start with p_right). It also removes np for less dependency and CenterCrop as it does not do anything after Resize with the same size.

Samil · January 19, 2022, 11:57am

F.pad function requires input as torch tensor but it is given PIL Image here. It throws error on me.

krishna.g · January 28, 2022, 7:57am

Use
import torchvision.transforms.functional as F
instead of
import torch.nn.functional as F

J_Johnson · January 5, 2023, 2:01pm

Good function but needs a little fix:

import torchvision.transforms.functional as F

class SquarePad:
	def __call__(self, image):
        s = image.size()
		max_wh = np.max([s[-1], s[-2])
		hp = int((max_wh - s[-1]) / 2)
		vp = int((max_wh - s[-2]) / 2)
		padding = (hp, vp, hp, vp)
		return F.pad(image, padding, 0, 'constant')

It was cropping on the wrong dim. Also, this deals with both grayscale and color images. Cheers.

Aziz_Ilyosov · March 28, 2023, 9:52am

It didn’t solve the issue

MatR · September 26, 2024, 2:42pm

FWIW, after finding this thread in 2024, it seems that using CenterCrop and setting the size to your desired output dimensions automatically pads the images and keeps it centered.