How to resize and pad in a torchvision.transforms.Compose()?

I’m creating a torchvision.datasets.ImageFolder() data loader, adding torchvision.transforms steps for preprocessing each image inside my training/validation datasets.

My main issue is that each image from training/validation has a different size (i.e.: 224x400, 150x300, 300x150, 224x224 etc). Since the classification model I’m training is very sensitive to the shape of the object in the image, I can’t make a simple torchvision.transforms.Resize(), I need to use padding to maintain the proportion of the objects.

Is there a simple way to add a padding step into a torchvision.transforms.Compose() pipeline (ensuring that every image is 224x224, without cropping the image, only doing a resize and padding)? *each image has a different original shape/size

data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((224,224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize((224,224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = "./my_data_dir/"
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=16,
                                             shuffle=True, num_workers=8)
              for x in ['train', 'val']}

Reading the torchvision.transforms.Pad() documentation, I’ve understood that I need to know the size of the padding beforehand, and know if it will be applied on left/right or top/bottom, before using this transform.

Is there a simple way to add a step on this transforms.Compose() to infer the image size, so I can get the parameters that I need to configure my torchvision.transforms.Pad()?

1 Like

I think I did something similar where I kept all ratios by making the each making all the images the width, and height of the biggest image. Then set the image in the center and pad the empty spaces. I don’t know if there’s an function that does this automatically but I did it myself. but I made functions like this in my data class, which works for me. After this you should be able resize and keep the ratios with the padding.

def get_padding(image):
    max_w = 1203 
    max_h = 1479
    
    imsize = image.size
    h_padding = (max_w - imsize[0]) / 2
    v_padding = (max_h - imsize[1]) / 2
    l_pad = h_padding if h_padding % 1 == 0 else h_padding+0.5
    t_pad = v_padding if v_padding % 1 == 0 else v_padding+0.5
    r_pad = h_padding if h_padding % 1 == 0 else h_padding-0.5
    b_pad = v_padding if v_padding % 1 == 0 else v_padding-0.5
    
    padding = (int(l_pad), int(t_pad), int(r_pad), int(b_pad))
    
    return padding
def pad_image(self, image):
        padded_im = pad(image, get_padding(image)) # torchvision.transforms.functional.pad
        return padded_im

Note: I just wrote this only to make it work, so it’s not optimal or ‘ideal way’ but it works for me.

3 Likes

Thanks @satrya-sabeni, based on your recommendation, I’ve prepared the following torchvision.transforms.Compose() pipeline:

from torchvision.transforms.functional import pad
from torchvision import transforms
import numpy as np
import numbers

def get_padding(image):    
    w, h = image.size
    max_wh = np.max([w, h])
    h_padding = (max_wh - w) / 2
    v_padding = (max_wh - h) / 2
    l_pad = h_padding if h_padding % 1 == 0 else h_padding+0.5
    t_pad = v_padding if v_padding % 1 == 0 else v_padding+0.5
    r_pad = h_padding if h_padding % 1 == 0 else h_padding-0.5
    b_pad = v_padding if v_padding % 1 == 0 else v_padding-0.5
    padding = (int(l_pad), int(t_pad), int(r_pad), int(b_pad))
    return padding

class NewPad(object):
    def __init__(self, fill=0, padding_mode='constant'):
        assert isinstance(fill, (numbers.Number, str, tuple))
        assert padding_mode in ['constant', 'edge', 'reflect', 'symmetric']

        self.fill = fill
        self.padding_mode = padding_mode
        
    def __call__(self, img):
        """
        Args:
            img (PIL Image): Image to be padded.

        Returns:
            PIL Image: Padded image.
        """
        return F.pad(img, get_padding(img), self.fill, self.padding_mode)
    
    def __repr__(self):
        return self.__class__.__name__ + '(padding={0}, fill={1}, padding_mode={2})'.\
            format(self.fill, self.padding_mode)

data_transforms = {
    'train': transforms.Compose([
        NewPad(),
        transforms.Resize((224,224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        NewPad(),
        transforms.Resize((224,224)),
        transforms.Grayscale(num_output_channels=3),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}
5 Likes
import torch.functional as F
F.pad 

has been moved to

from torchvision.transforms.functional import pad

as far as I can see.

Could you explain the % 1 and +0.5 part of your code snippet?
Isn’t % 1 always 0 ?

EDIT: Wait, I can see it. the / 2 division is not returning an integer but a float, which results in 0.5 values for image shape length non divisible by 2

This post saved my day, thanks! For simplicity, forget about the odd/even value of the image width/height is OK, the simplified code.

import torchvision.transforms.functional as F

class SquarePad:
	def __call__(self, image):
		w, h = image.size
		max_wh = np.max([w, h])
		hp = int((max_wh - w) / 2)
		vp = int((max_wh - h) / 2)
		padding = (hp, vp, hp, vp)
		return F.pad(image, padding, 0, 'constant')

# now use it as the replacement of transforms.Pad class
transform=transforms.Compose([
    SquarePad(),
    transforms.Resize(image_size),
    transforms.CenterCrop(image_size),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
13 Likes

Your SquarePad class works perfectly. Thank you!!

Mmmm, what I cannot fully understand is: the variable image_size is missing; how should I deal with the problem?

The above code snippet will pad to the maximum of width or height value of the image.
e.g.
input image - 180x240
resulting image will be - padded 240x240

Original image:cat

Padded image:padcat

I would extend the @weisunding 's code to be more precise as follows.

import torchvision.transforms.functional as F

class SquarePad:
    def __call__(self, image):
        max_wh = max(image.size)
        p_left, p_top = [(max_wh - s) // 2 for s in image.size]
        p_right, p_bottom = [max_wh - (s+pad) for s, pad in zip(image.size, [p_left, p_top])]
        padding = (p_left, p_top, p_right, p_bottom)
        return F.pad(image, padding, 0, 'constant')

target_image_size = (224, 224)  # as an example
# now use it as the replacement of transforms.Pad class
transform=transforms.Compose([
    SquarePad(),
    transforms.Resize(target_image_size),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

The original code fails to return a square image when either side is odd. The code above guarantees to return a square image by computing residuals (See a line start with p_right). It also removes np for less dependency and CenterCrop as it does not do anything after Resize with the same size.

7 Likes

F.pad function requires input as torch tensor but it is given PIL Image here. It throws error on me.

1 Like

Use
import torchvision.transforms.functional as F
instead of
import torch.nn.functional as F

Good function but needs a little fix:

import torchvision.transforms.functional as F

class SquarePad:
	def __call__(self, image):
        s = image.size()
		max_wh = np.max([s[-1], s[-2])
		hp = int((max_wh - s[-1]) / 2)
		vp = int((max_wh - s[-2]) / 2)
		padding = (hp, vp, hp, vp)
		return F.pad(image, padding, 0, 'constant')

It was cropping on the wrong dim. Also, this deals with both grayscale and color images. Cheers.

It didn’t solve the issue